New tale-protein scaffolds and uses thereof

ABSTRACT

The present invention relates to new Transcription Activator-Like Effector proteins and more particularly new Transcription Activator-Like Effector Nucleases (TALENs) that can efficiently target and process nucleic acids. The present invention also concerns methods to use these new Transcription Activator-Like Effector proteins. The present invention also relates to vectors, compositions and kits in which Transcription Activator-Like Effector proteins of the present invention are used.

FIELD OF THE INVENTION

The present invention relates to new Transcription Activator-Like Effector proteins and more particularly new Transcription Activator-Like Effector Nucleases (TALENs) that can efficiently target and process nucleic acids. The present invention also concerns methods to use these new Transcription Activator-Like Effector proteins. The present invention also relates to vectors, compositions and kits in which Transcription Activator-Like Effector proteins of the present invention are used.

BACKGROUND OF THE INVENTION

Transcription activator-like effectors (TAL effectors) proteins have emerged recently as an alternative tool for genome modifications. Despite the fact than meganucleases or Zinc Finger proteins have proven to be efficient tools for precise manipulation of the genome, one of the major limitations of these technologies is the difficulty and cost involve in their engineering. The promises of the TAL effectors scaffolds reside in the simplicity of the interactions existing between the protein and its DNA binding site that makes this technology within the reach of any laboratory.

Natural TAL effectors are produced by phytopathogenic bacteria and function upon infection as transcription activators of plant genes [for review see (Bogdanove, Schornack et al. 2010; Christian, Cermak et al. 2010)]. Since the isolation of the first TAL effectors gene (Bonas, Stall et al. 1989), the presence of repetitive motifs nearly identical within the central domain has been questioned. Today this central domain has been shown to be responsible for DNA recognition through a new type of DNA-binding domain (Boch, Scholze et al. 2009; Moscou and Bogdanove 2009). Each repeat is made usually of 33 or 34 amino acids and mediates the recognition of 1 nucleotide of the DNA target through 2 critical amino acids located at positions 12 and 13 in each repeat. These 2 hypervariable positions are referred as “repeat-variable di-residue (RVD). More than 15 different RVDs have been described today, however, HD, NG, NI, NN and NK are the most prevalent RVDs associated preferentially with the nucleotides C, T, A, G/A, and G respectively. Thus, the discovery of this simple code, where one RVD preferentially bind to one nucleotide and does not seem to be influenced by its neighboring repeat, allowed the a priori design of new sequential association of RVDs with novel DNA binding specificity (Boch, Scholze et al. 2009). This finding triggered off the interest of the scientific community for the TAL-effectors proteins as a potential tool for genome engineering, and it didn't take long before the first TAL Nuclease (TALEN) consisting of natural or custom TAL effectors fused to the nuclease catalytic domain of the Fok1 nuclease were made (Christian, Cermak et al. 2010; Miller, Tan et al. 2010; Cermak, Doyle et al. 2011; Li, Huang et al. 2011; Li, Huang et al. 2011)

Shortly after the first demonstrations that targets of new TAL effectors could be predicted and custom TALEs could function as transcription factor (Romer, Recht et al. 2009) (Boch, Scholze et al. 2009; Moscou and Bogdanove 2009), the first study on custom-TALENs as reagent for genome engineering was reported. (Christian, Cermak et al. 2010). Using the molecular strategy used for ZFNs, i.e. a pair of TAL effectors fused to Fok1 nuclease catalytic domains, binding respectively two nearby DNA sequences in opposite direction, the authors showed that (i) specific custom-made endonucleases could be produced and, (ii) using extra chromosomal assays, that they were efficient to induce homologous recombination between two inverted repeats. The size of the spacer length between the 2 DNA binding domains was also partly addressed in this pioneer work although one later study (Miller, Tan et al. 2010) brought better light on that issue. By analysis of 20 previously reported TAL effectors (Moscou and Bogdanove 2009) and their own data, the same team also published a guideline to help the design of de novo TALEN (Cermak, Doyle et al. 2011): the targeted sequence must start by a T, then T and A should be disfavored at position 1 and 2 respectively. They found also a strong bias towards the RVD-NG at the last position of the repeat arrays. Finally, the target DNA should have a low G content (9±8%). The robustness of these rules is not yet established. So far, the most established limitations when one want to design a TALEN against a chosen sequence is the presence of a T at the beginning of each target DNA sequence. Actually this essential T is not imposed by a specific RVD as it interacts directly with the N-terminal domain of the protein, and thus is not governed by the so called RVD code.

ZFN are classically heterodimeric proteins that bind two DNA sequences separated by 6 bp. The TALEN described so far were all designed following the same architecture i.e. TALEN acts as heterodimers proteins in which the nuclease catalytic domain such Fok1 is fused to the TALE C-terminal region. Thus, the optimal length of the DNA sequence separating the two binding domains had to be determined. While ZFNs DNA targets contain almost exclusively 6 bp intervening sequences, TALEN appears to tolerate a much more wide range of DNA length and as expected, appears to be dependent on the TALE scaffold used. As mentioned above, natural TAL effectors proteins are made of RVDs arrays that confer specificity, flanked by an N-terminal peptide sequence involves in Cellular trafficking and a C-terminal domain that contains the trans-activator domain and nuclear localization sequences (NLS). Early works on TALE have already demonstrated that 152 amino acids could be deleted from the N-terminal domain without affecting the protein activity (Szurek, Rossier et al. 2002; Gurlebeck, Szurek et al. 2005). Obviously, for nuclease purposes, the trans-activation domain appears superfluous. The early study performed by Christian et al (Christian, Cermak et al. 2010) used two BamHI restriction sites located in the N- and C-terminal domains to truncate the protein. Without further investigation the authors were able to show that this design was effective to elicit active TALEN. Furthermore using this scaffold, a spacer of 15 nucleotides was optimum, although 18 or 24 bp could be also possible. In yeast assay, this design could achieve similar activity than activity observed with ZFN. The best analysis available today on scaffold optimization was performed by Miller et al. (Miller, Tan et al. 2010) that used TAL effectors lacking its first 152 amino acids and tested a combination a C-terminal truncations of TALE on homodimeric TALEN activity against targets bearing various lengths of spacer (from 2 to 24 bps). A spacer length below 10 nucleotides did not allowed efficient cleavage in vitro as did the C-terminal truncation bearing the 95 first amino acids of the C-terminal domain. Moreover, TALEN bearing the 28 first residues of the C-terminal domain showed nuclease activity in vitro when tested on target comprising spacer from 10 to 24 bps, with a maximal activity for spacer of 12-13 bps. Sequences narrower than 8 bp did not allow significant cleavage activity. Even though some guidance's were described (Cermak, Doyle et al. 2011) to help designing active TALEN, too few data are available today to confirm their benefits.

The inventors have developed a new type of TAL effector proteins and particularly a new type of TALEN that can be engineered to specifically recognize and process target nucleic acid efficiently, overpassing the actual limitations.

BRIEF SUMMARY OF THE INVENTION

In a general aspect, the present invention relates to new Transcription Activator-Like Effector proteins and more particularly new Transcription Activator-Like Effector Nucleases (TALENs) that can efficiently target and process nucleic acids. The present invention also concerns methods to use these new Transcription Activator-Like Effector proteins for various applications. In another aspect, the present invention also concerns the creation of functional single-polypeptide fusion proteins, i.e chimeric proteins derived from a Transcription Activator-Like Effector for simple and efficient vectorization. In another aspect, the present invention also relates to vectors, compositions and kits in which chimeric proteins of the present invention derived of Transcription Activator-Like Effector are used.

BRIEF DESCRIPTION OF THE FIGURES AND TABLES

In addition to the preceding features, the invention further comprises other features which will emerge from the description which follows, as well as to the appended drawings. A more complete appreciation of the invention and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following Figures in conjunction with the detailed description below.

FIG. 1: General description of a chimeric protein according to the present invention.

FIG. 2: Activity of AvrBs3-derived TALEN in yeast (30° C.); AvrBs3-derived TALEN is represented by a black bar and negative control (empty vector) and positive control (I-SceI meganuclease) are represented by grey bars. Activities are normalized to the positive control.

FIG. 3: Activity of AVRBs3-derived TALEN in mammalian cells (Extrachromosomic assay in CHO-K1); AvrBs3-derived TALEN is represented by a black curve, negative control (empty vector) by a dark grey curve and positive control (I-SceI meganuclease) by a light grey curve

FIG. 4: Activity of N152 AvrBs3-derived TALEN in yeast (30° C.); truncated variants is represented by a grey bars and AvrBs3-derived TALEN (control wt) is represented by a black bar. Activities are normalized to the AvrBs3-derived TALEN (Control wt) (SEQ ID NO: 5) activity on its 15 bp target (Avr15) (SEQ ID NO: 6).

FIG. 5: Impact of truncations of the TALE C-terminal domain on AvrBs3-derived TALEN in yeast; truncated variants are represented by grey bars and AvrBs3-derived TALEN (control wt) is represented by a black bar; Activities are normalized to the AvrBs3-derived TALEN (Control wt) (SEQ ID NO: 5) activity on its 15 bp target (Avr15) (SEQ ID NO: 6).

FIG. 6: Activity in yeast of AvrBs3-derived TALENs comprising truncations of the C-terminal domain against targets of various spacer lengths. Effect of spacer length on: A, AVRBS3 TALEN having the C-terminal domain truncated after position E886 (C0) (SEQ ID NO: 19); C0 truncated variant is represented by grey bars and AvrBs3-derived TALEN is represented by black bars. Activities are normalized to the AvrBs3-derived TALEN (Control wt) activity on its 15 bp target (Avr15). * indicates no detectable activity. B, AVRBS3 TALEN having the C-terminal domain truncated after position P897 (C11) (SEQ ID NO: 20); C11 truncated variant is represented by grey bars and AvrBs3-derived TALEN is represented by black bars. Activities are normalized to the AvrBs3-derived TALEN (Control wt) activity on its 15 bp target (Avr15). * indicates no activity over negative control. C, AVRBS3 TALEN having the C-terminal domain truncated after position G914 (C28) (SEQ ID NO: 21); C28 truncated variant is represented by grey bars and AvrBs3-derived TALEN is represented by black bars. Activities are normalized to the AvrBs3-derived TALEN (Control wt) activity on its 15 bp target (Avr15). * indicates no activity over negative control. D, AVRBS3 TALEN having the C-terminal domain truncated after position L926 (C40) (SEQ ID NO: 22); C40 truncated variant is represented by grey bars and AvrBs3-derived TALEN is represented by black bars. Activities are normalized to the AvrBs3-derived TALEN (Control wt) activity on its 15 bp target (Avr15). * indicates no activity over negative control. E, AVRBS3 TALEN having the C-terminal domain truncated after position D950 (C64) (SEQ ID NO: 23); C64 truncated variant is represented by grey bars and AvrBs3-derived TALEN is represented by black bars. Activities are normalized to the AvrBs3-derived TALEN (Control wt) activity on its 15 bp target (Avr15). * indicates no activity over negative control. F, AVRBS3 TALEN having the C-terminal domain truncated after position T1003 (C115) (SEQ ID NO: 24); C115 truncated variant is represented by grey bars and AvrBs3-derived TALEN is represented by black bars. Activities are normalized to the AvrBs3-derived TALEN (Control wt) activity on its 15 bp target (Avr15). * indicates no activity over negative control. G, AVRBS3 TALEN having the C-terminal domain truncated after position E1057 (C172) (SEQ ID NO: 25); C172 truncated variant is represented by grey bars and AvrBs3-derived TALEN is represented by black bars. Activities are normalized to the AvrBs3-derived TALEN (Control wt) activity on its 15 bp target (Avr15). * indicates no activity over negative control.

FIG. 7: Impact of nucleotide identity at position n of the target on the AvrBs3-derived TALEN activity (in yeast); Control target Avr15 with a T at position n is represented by a black bar. Activities are normalized to the AvrBs3-derived TALEN (Control wt) (SEQ ID NO: 5) activity on its 15 bp target (Avr15) (SEQ ID NO: 6).

FIG. 8: Activity of engineered TALENs in mammalian cells (Extrachromosomic assay in CHO-K1); DMDT2.1 TALEN (SEQ ID NO: 180, SEQ ID NO: 186, SEQ ID NO: 189; example 6) is represented by a dark grey curve, ILRGT2.1 TALEN (SEQ ID NO: 181, SEQ ID NO: 187, SEQ ID NO: 190; example 6) is represented by a black curve, HBBT1.1 TALEN (SEQ ID NO: 182, SEQ ID NO: 188, SEQ ID NO: 192; example 6) is represented by a dark grey curve, negative control (empty vector) by a light grey dashed curve and positive control (I-SceI meganuclease) by a black dashed curve.

FIG. 9: Schematic of chimeric protein configurations according to the invention.

FIG. 10: Schematic of the method for optimizing the control of double-stranded break activity of a chimeric protein according to the invention.

FIG. 11: Schematic of the method for increasing the number of targets that can be reach by a chimeric protein according to the invention.

FIG. 12: Activity of TALE-AvrBs3::TevI in yeast (37° C.). The negative control consists in a TALEN without any RVDs. n.d. indicates no detectable activity, + indicates an activity over 0.3 in yeast assay and +++ indicates an activity over 0.7 in yeast assay (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

FIG. 13: Activity of TALE-AvrBs3::TevI in mammalian cells. (Extrachromosomic assay in CHO-K1). pCLS8993 (SEQ ID NO: 457) is represented by a black bar and pCLS8994 (SEQ ID NO: 458) is represented by a dark grey bar. Negative control (empty vector) by a white bar and positive control (I-SceI meganuclease) by a light grey bar. Data are normalized relative to the positive control.

FIG. 14: Activity of TALE-AvrBs3::NucA in yeast (37° C.). The negative control is a target lacking a recognition site (neg. ctrl.: SEQ ID NO: 472). Compact is a target having only one recognition site (SEQ ID NO: 468). n.d. indicates no detectable activity, + indicates an activity over 0.3 in yeast assay at 37° C.; ++ indicates an activity over 0.5 in yeast assay at 37° C. and +++ indicates an activity over 0.7 in yeast assay at 37° C. (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

FIG. 15: Activity of TALE-AvrBs3::ColE7 in yeast (37° C.). The negative control is a target lacking a recognition site (neg. ctrl.: SEQ ID NO: 472). Compact is a target having only one recognition site (SEQ ID NO: 468). n.d. indicates no detectable activity, + indicates an activity over 0.3 in yeast assay at 37° C.; ++ indicates an activity over 0.5 in yeast assay at 37° C. and +++ indicates an activity over 0.7 in yeast assay at 37° C. (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

FIGS. 16 to 19: Activity of asymmetrical TALENs in yeast (37° C.). n.d. indicates no detectable activity at 37° C., +/− indicated an activity above 0.3 in yeast assay at 37° C.; + indicated an activity over 0.3 in yeast assay at 37° C.; ++ indicated an activity over 0.5 in yeast assay at 37° C.; +++ indicated an activity over 0.75 in yeast assay at 37° C. (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

FIG. 20: Activity of TALE-AvrBs3::EndoT7 in yeast (37° C.). n.d. indicates no detectable activity at 37° C., + indicated an activity over 0.3 in yeast assay at 37° C.; ++ indicated an activity over 0.5 in yeast assay at 37° C.; +++ indicated an activity over 0.75 in yeast assay at 37° C. (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

FIG. 21: Activity of colE7 (SEQ ID NO: 340), EndoT7 (SEQ ID NO: 363) and I-TevI (SEQ ID NO: 349) catalytic heads containing TALEN with various polypeptide linker in yeast (37° C.). Compact is a target having only one recognition site (SEQ ID NO: 224). n.d. indicates no detectable activity at 37° C., + indicated an activity over 0.3 in yeast assay at 37° C.; ++ indicated an activity over 0.5 in yeast assay at 37° C.; +++ indicated an activity over 0.75 in yeast assay at 37° C. (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

FIG. 22: Activity of TALEN having various polypeptide linkers (37° C.). The negative control is a target lacking recognition sites. n.d. indicates no detectable activity at 37° C., +/− indicated an activity above 0.7 in yeast assay at 37° C.; +++ indicated an activity over 0.70 in yeast assay at 37° C. (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

FIG. 23: Tal repeats arrays assembly and subcloning into yeast expression plasmids. A. Legend of materials used for TAL repeat assembly. B. immobilization of the first biotinylated TAL repeat fragment on a streptavidin coated solid support and ligation to a second TAL repeat harboring SfaNI compatible overhangs (BbvI overhangs displayed in red). C. consecutive ligation/restriction of TAL repeats to generate the complete TAL repeats array. D. SfaNI digestion of the TAL repeats array. E. BbvI digestion and recovery of the TAL repeats array. Subcloning of TAL repeats array into yeast expression plasmids harboring the Nterminal domain of AvrBs3 TAL effector, the forty first amino acids of its Cterminal domain fused to FokI type IIS restriction endonuclease

FIG. 24. Influence of TAL repeat number on TALEN activity. TALEN activities of 52 different TALENs (SEQ ID NO: 507-558) bearing from 9.5 to 15.5 TAL repeats were determined. The mean values of TALEN activities are displayed as a function of TAL repeat number; error bars represent the standard deviation of activities of TALENs bearing the same number of TAL repeats.

FIG. 25. Influence of different single protein/DNA mismatches at position N-1 or N-2 on TALEN activity as a function of TAL repeat number.

A. Presentation of the different components constituting a TALEN.

B. Presentation of a homodimeric RAGT2.4 target (SEQ ID NO: 601).

C. Examples of RAGT2.4 TALENs bearing 15.5 or 11.5 Tal repeats (SEQ ID NO: 617 or 622) along with their respective DNA targets (top and bottom respectively).

D. Experimental results reporting RAGT2.3 and RAGT2.4 TALEN activities as a function of TAL repeat number and nature of mismatch at N and N-1 positions.

FIG. 26. Influence of C-terminal domain substitution by polypeptide linkers 8, 27 and 35 on AvrBs3 TALEN nuclease activity in yeast. AvrBs3 TALENs bearing polypeptide linkers 8, 27 and 35 (SEQ ID: 141, 160 and 168) as C-terminal domain were assayed toward AvrBs3 homodimeric targets bearing from 5 to 30 bp DNA spacer. Their yeast activities are displayed as a function of spacer length.

FIG. 27: Activities for novel variations of the TALE::FokI scaffold. The negative control consists in a TALEN without any RVDs. n.d. indicates no detectable activity, + indicates an activity between 0.3 and 0.5 in our assay, ++ indicates an activity between 0.5 and 0.7 in our assay and +++ indicates an activity over 0.7 in our assay (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

FIG. 28: Activities of combination TALE::FokI and FokI::TALE in yeast. The negative control consists in a TALEN without any RVDs. n.d. indicates no detectable activity, ++ indicates an activity between 0.5 and 0.7 in our assay and +++ indicates an activity over 0.7 in our assay (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

Table 3: List of AvrBs3 targets with various spacer lengths (SEQ ID NO: 220 to 255).

Table 4: List of AvrBs3 targets with all combination at position n (SEQ ID NO: 171 to 174).

Table 5: List of AvrBs3/RAGT2 (heterodimer) targets with various spacer lengths (SEQ ID NO: 256 to 291).

Table 6: Activities of 27 custom TALEN tested as homodimers, in four different scaffolds (full wt C-terminal domain, C0 truncated C-terminal domain, C11 truncated C-terminal domain and C40 truncated C-terminal domain); (n.a: non available; n.d: non detected) (SEQ ID NO: 26 to 133).

Table 7: Binding and target sequences of 27 custom TALEN (homodimers) (SEQ ID NO: 193 to 219).

Table 8: List of AvrBs3 targets with various spacer lengths (SEQ ID NO: 220 to 255) including a target with only one recognition site (compact, SEQ ID NO: 468) and a negative control target (neg. ctrl., SEQ ID NO: 472) consisting in a target without any recognition site.

Table 9: List of AvrBs3 targets containing two identical recognition sequences juxtaposed with the 5′ ends proximal and separated by “spacer” DNA ranging from 5 to 35 bps (SEQ ID NO: 629 to 659).

Table 10: List of RagT2-R/AvrBs3 hybrid targets contain two different recognition sequences juxtaposed with the 3′ end of the first (RagT2-R) proximal to the 5′ end of the second (AvrBs3) and separated by “spacer” DNA ranging from 5 to 40 bps (SEQ ID NO: 666 to 701),

DETAILED DESCRIPTION OF THE INVENTION

Unless specifically defined herein, all technical and scientific terms used have the same meaning as commonly understood by a skilled artisan in the fields of gene therapy, biochemistry, genetics, and molecular biology.

All methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, with suitable methods and materials being described herein. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will prevail. Further, the materials, methods, and examples are illustrative only and are not intended to be limiting, unless otherwise specified.

The practice of the present invention will employ, unless otherwise indicated, conventional techniques of cell biology, cell culture, molecular biology, transgenic biology, microbiology, recombinant DNA, and immunology, which are within the skill of the art. Such techniques are explained fully in the literature. See, for example, Current Protocols in Molecular Biology (Frederick M. AUSUBEL, 2000, Wiley and son Inc, Library of Congress, USA); Molecular Cloning: A Laboratory Manual, Third Edition, (Sambrook et al, 2001, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press); Oligonucleotide Synthesis (M. J. Gait ed., 1984); Mullis et al. U.S. Pat. No. 4,683,195; Nucleic Acid Hybridization (B. D. Harries & S. J. Higgins eds. 1984); Transcription And Translation (B. D. Hames & S. J. Higgins eds. 1984); Culture Of Animal Cells (R. I. Freshney, Alan R. Liss, Inc., 1987); Immobilized Cells And Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide To Molecular Cloning (1984); the series, Methods In ENZYMOLOGY (J. Abelson and M. Simon, eds.-in-chief, Academic Press, Inc., New York), specifically, Vols. 154 and 155 (Wu et al. eds.) and Vol. 185, “Gene Expression Technology” (D. Goeddel, ed.); Gene Transfer Vectors For Mammalian Cells (J. H. Miller and M. P. Calos eds., 1987, Cold Spring Harbor Laboratory); Immunochemical Methods In Cell And Molecular Biology (Mayer and Walker, eds., Academic Press, London, 1987); Handbook Of Experimental Immunology, Volumes I-IV (D. M. Weir and C. C. Blackwell, eds., 1986); and Manipulating the Mouse Embryo, (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1986).

In a general aspect, the present invention relates to new Transcription Activator-Like Effector proteins and more particularly new Transcription Activator-Like Effector Nucleases (TALENs) that can efficiently target and process nucleic acids.

According to a first aspect of the present invention is a chimeric protein constituted by a core scaffold comprising a DNA binding domain and a protein domain to process a nucleic acid target sequence.

In a first embodiment, the present invention relates to a chimeric protein derived from a Transcription Activator-Like Effector (TALE) comprising:

-   -   (i) A core scaffold comprising Repeat Variable Dipeptide regions         (RVDs) having specificity to bind a nucleic acid sequence         adjacent to a nucleic acid target sequence to process;     -   (ii) A catalytic domain to process said nucleic acid target         sequence.

In another embodiment, said chimeric protein further comprises a peptidic linker to fuse said catalytic domain to said core scaffold.

In another embodiment is a chimeric protein constituted by a core scaffold comprising a DNA binding domain and a protein domain to process a nucleic acid target sequence. In a preferred embodiment of this first aspect is a chimeric protein derived from a Transcription Activator-Like Effector (TALE) comprising:

-   -   (i) A core scaffold comprising a set of Repeat Variable         Dipeptide regions (RVDs) able to bind a nucleic acid sequence         adjacent to a nucleic acid target sequence to process wherein         each RVD comprises a pair of amino acids responsible for         recognizing one nucleotide selected from the group consisting of         HD for recognizing C, NG for recognizing T, NI for recognizing         A, NN for recognizing G or A, NS for recognizing A, C, G or T,         HG for recognizing T, IG for recognizing T, NK for recognizing         G, HA for recognizing C, ND for recognizing C, HI for         recognizing C, HN for recognizing G, NA for recognizing G, SN         for recognizing G or A and YG for recognizing T, TL for         recognizing A, VT for recognizing A or G and SW for recognizing         A.     -   (ii) A catalytic domain to process said nucleic acid target         sequence.

In another embodiment, the chimeric protein of the present invention is derived from any naturally occurring TAL effectors, such as those described in Bogdanove et al. (Boch and Bonas 2010; Bogdanove, Schornack et al. 2010) and listed in Boch et al. (Boch and Bonas 2010). In a preferred embodiment, the chimeric protein of the present invention is derived from any TAL effectors of plant pathogenic bacteria in the genus Xanthomonas as listed in Boch et al (Boch and Bonas 2010) as a non-limiting example. In another embodiment, only one part of the core scaffold is derived from a TAL effector; as an illustrative example, only said set of Repeat Variable Dipeptide regions is derived from a TAL effector.

In another embodiment, each RVD of said core scaffold is made of 30 to 42 amino acids, more preferably 33 or 34 wherein two critical amino acids located at positions 12 and 13 mediates the recognition of one nucleotide of said nucleic acid target sequence; equivalent two critical amino acids can be located at positions other than 12 and 13 specially in RVDs taller than 33 or 34 amino acids long. Preferably, RVDs associated with recognition of the different nucleotides are HD for recognizing C, NG for recognizing T, NI for recognizing A, NN for recognizing G or A, NS for recognizing A, C, G or T, HG for recognizing T, IG for recognizing T, NK for recognizing G, HA for recognizing C, ND for recognizing C, HI for recognizing C, HN for recognizing G, NA for recognizing G, SN for recognizing G or A and YG for recognizing T, TL for recognizing A, VT for recognizing A or G and SW for recognizing A. More preferably, RVDs associated with recognition of the nucleotides C, T, A, G/A and G respectively are selected from the group consisting of NN or NK for recognizing G, HD for recognizing C, NG for recognizing T and NI for recognizing A, TL for recognizing A, VT for recognizing A or G and SW for recognizing A. In another embodiment, RVDS associated with recognition of the nucleotide C are selected from the group consisting of N* and RVDS associated with recognition of the nucleotide T are selected from the group consisting of N* and H*, where * denotes a gap in the repeat sequence that corresponds to a lack of amino acid residue at the second position of the RVD. In another embodiment, critical amino acids 12 and 13 can be mutated towards other amino acid residues in order to modulate their specificity towards nucleotides A, T, C and G and in particular to enhance this specificity. By other amino acid residues is intended any of the twenty natural amino acid residues or unnatural amino acids derivatives.

In another embodiment, said core scaffold of the present invention comprises between 8 and 30 RVDs. More preferably, said core scaffold of the present invention comprises between 8 and 20 RVDs; again more preferably 15 RVDs.

In another embodiment, said core scaffold comprises an additional single truncated RVD made of 20 amino acids located at the C-terminus of said set of RVDs, i.e. an additional C-terminal half-RVD. In this case, said core scaffold of the present invention comprises between 8.5 and 30.5 RVDs, “0.5” referring to previously mentioned half-RVD (or terminal RVD, or half-repeat). More preferably, said core scaffold of the present invention comprises between 8.5 and 20.5 RVDs, again more preferably, 15.5 RVDs. In a preferred embodiment, said half-RVD is in a core scaffold context which allows a lack of specificity of said half-RVD toward nucleotides A, C, G, T. In a more preferred embodiment, said half-RVD is absent.

In another embodiment, said core scaffold of the present invention comprises RVDs of different origins. In a preferred embodiment, said core scaffold comprises RVDs originating from different naturally occurring TAL effectors. In another preferred embodiment, internal structure of some RVDs of the core scaffold of the present invention are constituted by structures or sequences originated from different naturally occurring TAL effectors. In another embodiment, said core scaffold of the present invention comprises RVDs-like domains. RVDs-like domains have a sequence different from naturally occurring RVDs but have the same function and/or global structure within said core scaffold of the present invention. As non-limiting examples, said RVDs-like domains are protein domains selected from the group consisting of Puf RNA binding protein or Ankyrin super-family. Non-limiting examples of such proteins from which RVDs-like domain can be derived are given by SEQ ID NO: 398 and SEQ ID NO: 399 respectively corresponding to proteins fem-3 and aRep. Depending on the structural context and binding constraints, said core scaffold of the chimeric protein of the present invention comprises a mix of naturally occurring RVDs structures and RVDs-like domains.

In another embodiment, said core scaffold of the present invention is entirely composed by RVDs-like domains that are not originated from pathogenic organisms. In this embodiment, said core scaffold of the chimeric protein of the present invention is totally artificial, i.e. without any RVDs-like domains derived from naturally occurring TAL effectors.

In another embodiment, are encompassed variants of naturally occurring RVDs enriching the code mentioned above by mutating critical amino acids located at positions 12 and 13 towards other existing amino acids. Such mutations can also overcome nucleic acid modifications such as DNA alkylation (acetylation, methylation) as a non-limiting example; in this case, the core scaffold of the chimeric protein of the invention can have a higher selectivity for a methylated or unmethylated target sequence; in other words, said core scaffold can tolerate nucleic acid methylation or can be specific of a methylated target sequence. In another embodiment, are encompassed variants of naturally occurring RVDs that are mutated in other amino acids of structural importance. As a non-limiting example, VVAIA and LLPVL amino acids motifs in RVDs may be of structural importance for hydrophobic contact between helices of adjacent RVD and may represent good candidate motifs to mutate for modulating the intra RVD interactions or interdependency between adjacent RVDs in a set of repeated variable domains and thus the affinity and specificity of subsequent TALENs. In another embodiment of the present invention are also encompassed RVDs variants mutated at these residues to modify interactions between adjacent RVDs resulting in a core scaffold of the present invention with more rigidity. At the opposite, are also encompassed in the scope of the chimeric protein of the present invention RVDs variants mutated at these residues to obtain a core scaffold with more flexibility. More or less rigidity in core scaffolds of the present invention allows to enhance or decrease protein-protein interactions within the structure of the scaffold, particularly when adjacent RVDs in said core scaffold are from different origins; also, it allows to modify protein-nucleic acid interactions between said core scaffold of the chimeric protein of the present invention and its nucleic acid target. Modifications of protein-protein or protein-nucleic acid interactions can be quantified by measuring biochemical constants [affinity (Ka)/dissociation (Kd)/turn over (Kcat) constants] associated with such interactions/reactions.

In another embodiment, said core scaffold of the present invention comprises one additional domain at the N-terminus of said set of Repeat Variable Dipeptide regions. In another embodiment, said additional N-terminus domain is derived from the N-terminus domain of a naturally occurring TAL effector. In another embodiment, said additional N-terminus domain is the full-length N-terminus domain of a naturally occurring TAL effector N-terminus domain. In another embodiment, said additional N-terminus domain is a variant of a naturally occurring Tal effector. In another embodiment, said additional domain is a truncated variant of a naturally occurring TAL effector N-terminus domain. In another embodiment, said additional domain is a truncated version of AvrBs3 TAL effector. In another embodiment, said truncated version lacks at its C-terminus extremity the peptidic sequence that confers specificity to position 0 of the recognized and bound sequence, i.e. the “RVD0” repeat, named for a postulated 0^(th) repeat that has only weak sequence similarity but a predicted structural similarity to the repeats in Bogdanove et al. (Bogdanove 2010 current opinion). In another embodiment, said truncated version lacks at least one amino acid residue selected from the group consisting of the 152 first N-terminal amino acids residues. In another embodiment, said truncated version lacks more than the first 152 amino acids residues.

In another embodiment, said additional N-terminus domain is a non-TAL effector originating domain. In another embodiment, said additional N-terminus domain is derived from a protein having non-specific nucleic acid binding characteristics. In this embodiment, said additional N-terminus domain is derived from a protein having non-specific DNA binding characteristics. In this embodiment, said additional N-terminus domain is derived from a protein having non-specific RNA binding characteristics. In another embodiment, said additional N-terminus domain is derived from a protein having specific nucleic acid binding characteristics, such as non-limiting examples, meganucleases or zinc-finger proteins or derivatives of those such as variants with only DNA binding activity. In another embodiment, said additional N-terminus domain is a chimeric domain comprising a TAL effector originating subdomain and a non-TAL effector originating subdomain.

In another embodiment, said additional N-terminus domain is a variant increasing the affinity of said core scaffold of the chimeric protein of the present invention toward its binding nucleic acid sequence. In another embodiment, said additional N-terminus domain is a variant which allows overcoming sequence constraints associated with said RVD0, i.e. the necessity to have a T as the first base on binding nucleic acid sequence. In a preferred embodiment, said additional N-terminus domain is a variant which allows changing this sequence constraint to A, G or C respectively. In a more preferred embodiment, said additional N-terminus domain is a variant which allow suppressing the sequence constraints associated with RVD0.

In another embodiment, said additional N-terminus domain of the core scaffold of the chimeric protein of the present invention comprises a localization sequence (or signal) which allows targeting said chimeric protein toward a given organelle within an organism, a tissue or a cell. Non-limiting examples of such localization signals are nuclear localization signals, chloroplastic localization signals or mitochondrial localization signals. In another embodiment, said additional N-terminus domain can comprise a nuclear export signal having the opposite effect of a nuclear localization signal to help targeting organelles such as chloroplasts or mitochondria. In the scope of the present invention are also encompassed additional N-terminus domains with a combination of several localization signals. Such combinations can be as a non-limiting example a nuclear localization signal and a tissue-specific signal to help addressing said chimeric protein of the present invention in the nuclear of tissue specific cells.

In another embodiment, said additional N-terminal domain can be fused with a protein domain, a protein module, an antibody (or part of it) or a tag of interest, well-known in the art, for a specific application. In another embodiment, said additional N-terminal domain can be linked with a chemical molecule such as a small compound of interest for a defined application.

In a preferred embodiment, said additional domain at the N-terminus of said set of Repeat Variable Dipeptide regions of said core scaffold of the present invention is selected from the group consisting of SEQ ID NO: 292, SEQ ID NO: 293 and SEQ ID NO: 401 or derivatives thereof.

In another embodiment, said core scaffold of the present invention comprises one additional domain at the C-terminus of said set of Repeat Variable Dipeptide regions. In another embodiment, said additional C-terminus domain is derived from the C-terminus domain of a naturally occurring TAL effector. In another embodiment, said additional C-terminus domain is the full-length C-terminus domain of a naturally occurring TAL effector. In another embodiment, said additional C-terminus domain is a variant of a naturally occurring Tal effector C-terminus domain. In another embodiment, said additional domain is a truncated variant of a naturally occurring TAL effector C-terminus domain. In another embodiment, said truncated version is a C-terminus domain without Activation Domain (SEQ ID NO: 400 and 402). In another embodiment, said additional domain is a truncated version of AvrBs3 TAL effector. In another embodiment, said additional domain is truncated after position E886 (C0). In another embodiment, said additional domain is truncated after position P897 (C11; SEQ ID NO: 295). In another embodiment, said additional domain is truncated after position G914 (C28; SEQ ID NO: 296). In another embodiment, said additional domain is truncated after position L926 (C40; SEQ ID NO: 297). In another embodiment, said additional domain is truncated after position D950 (C64; SEQ ID NO: 298). In another embodiment, said additional domain is truncated after position R1000 (C115; SEQ ID NO: 299). In another embodiment, said additional domain is truncated after position D1059 (C172; SEQ ID NO: 300) (amino acid numbering refers to C-terminus domain of AvrBs3 TAL effector).

In another embodiment, said additional C-terminus domain is a non-TAL effector originating domain. In another embodiment, said additional C-terminus domain is derived from a protein having non-specific nucleic acid binding characteristics. In this embodiment, said additional C-terminus domain is derived from a protein having non-specific DNA binding characteristics. In this embodiment, said additional C-terminus domain is derived from a protein having non-specific RNA binding characteristics. In another embodiment, said additional C-terminus domain is derived from a protein having specific nucleic acid binding characteristics, such as non-limiting examples, meganucleases or zinc-finger proteins or derivatives of those such as variants with only DNA binding activity. In another embodiment, said additional C-terminus domain is a chimeric domain comprising a TAL effector originating subdomain and a non-TAL effector originating subdomain.

In another embodiment, said additional C-terminus domain is a variant increasing the affinity of said core scaffold of the chimeric protein of the present invention toward its binding nucleic acid sequence.

In another embodiment, said additional C-terminus domain of the core scaffold of the chimeric protein of the present invention comprises a localization sequence (or signal) which allows targeting said chimeric protein toward a given organelle within an organism, a tissue or a cell. Non-limiting examples of such localization signals are nuclear localization signals, chloroplastic localization signals or mitochondrial localization signals. In another embodiment, said additional C-terminus domain can comprise a nuclear export signal having the opposite effect of a nuclear localization signal to help targeting organelles such as chloroplasts or mitochondria. In the scope of the present invention are also encompassed additional C-terminus domains with a combination of several localization signals. Such combinations can be as a non-limiting example a nuclear localization signal and a tissue-specific signal to help addressing said chimeric protein of the present invention in the nuclear of tissue specific cells.

In another embodiment, said additional C-terminal domain can be fused with a protein domain, a protein module, an antibody (or part of it) or a tag of interest, well-known in the art, for a specific application. In another embodiment, said additional C-terminal domain can be linked with a chemical molecule such as a small compound of interest for a defined application.

In a preferred embodiment, said additional domain at the C-terminus of said set of Repeat Variable Dipeptide regions of said core scaffold of the present invention is selected from the group consisting of SEQ ID NO: 295 to 300, SEQ ID NO: 400 and SEQ ID NO: 402 or derivatives thereof.

In another embodiment, said core scaffold of the chimeric protein according to the present invention comprises two additional domains respectively at the N-terminus and at the C-terminus of said set of Repeat Variable Dipeptide regions, as previously described.

In another embodiment, said chimeric protein according to the present invention comprises at least one peptidic linker to fuse a protein domain to said core scaffold previously described. In a preferred embodiment, said peptidic linker is flexible. In another preferred embodiment, said peptidic linker is structured. In a more preferred embodiment, said peptidic linker sequence is selected from the group consisting of NFS1, NFS2, CFS1, RM2, BQY, QGPSG, LGPDGRKA, 1a8h_(—)1, 1 dnpA_(—)1, 1 d8cA_(—)2, 1 ckqA_(—)3, 1sbp_(—)1, 1 ev7A_(—)1, 1 alo_(—)3, 1amf_(—)1, 1 adjA_(—)3, 1 fcdC_(—)1, 1 al3_(—)2, 1 g3p_(—)1, 1acc_(—)3, 1 ahjB_(—)1, 1acc_(—)1, 1 af7_(—)1, 1 heiA_(—)1, 1bia_(—)2, 1 igtB_(—)1, 1 nfkA_(—)1, 1 au7A_(—)1, 1 bpoB_(—)1, 1 b0 pA_(—)2, 1 c05A_(—)2, 1gcb_(—)1, 1 bt3A_(—)1, 1 b3oB_(—)2, 16 vpA_(—)6, 1 dhx_(—)1, 1 b8aA_(—)1, 1 qu6A_(—)1 and TAL1 to TAL37 which share SGGSGS stretchs at both N and C-terminal ends and surround a variable region of 3 to 28 amino acids as listed in Table 1 below (SEQ ID NO: 301 to SEQ ID NO: 338 and SEQ ID NO: 134 to SEQ ID NO: 170 and SEQ ID NO: 479 to 485). In a more preferred embodiment, the peptidic linker that can link said core scaffold to said protein domain of the chimeric protein according to the present invention can be selected from the group consisting of TAL1 to TAL37 (SEQ ID NO: 134 to SEQ ID NO: 170). In the scope of the present invention is also encompassed the case where a peptidic linker is not needed to fuse said core scaffold with said protein domain in order to obtain a chimeric protein according to the present invention. In the scope of the present invention is also encompassed the case where more than one linker is needed to fuse several protein domains with said core scaffold according to the present invention. As non-limiting examples, two, three or four linkers can be used in the same chimeric protein according to the present invention. In another embodiment, said peptidic linker contains one or several active domains which allow its deployment under stimulation. As a non-limiting example, said peptidic linker can contain a calmodulin domain that changes its conformation under calcium stimulation; other protein domains changing their conformation under a specific metabolite interaction can be used. As another non-limiting example, such peptidic linker according to the present invention can contain a light sensitive domain which allows a change in peptidic linker structure from a folded inactive state toward an unfolded active state under light stimulation for example, or reverse. Said peptidic linker can for example contain a first light-sensitive protein switch comprising a phytochrome-chromophore complex and a Phytochrome Interaction Factor (PIF) i.e. a second protein able to reversibly interact with said phytochrome-chromophore complex depending on the light activation/desactivation state. Other examples of active linkers can use small molecules such as Chemical Inducers of Dimerization (CID).

TABLE 1 List of peptidic linkers. SEQ Name Amino ID (PDB) Acids Length Size Sequence NO 1a8h_1 285-287 3 6,636 NVG 301 1dnpA_1 130-133 4 7,422 DSVI 302 1d8cA_2 260-263 4 8,782 IVEA 303 1ckqA_3 169-172 4 9,91 LEGS 304 1sbp_1 93-96 4 10,718 YTST 305 1ev7A_1 169-173 5 11,461 LQENL 306 1alo_3 360-364 5 12,051 VGRQP 307 1amf_1 81-85 5 13,501 LGNSL 308 1adjA_3 323-328 6 14,835 LPEEKG 309 1fcdC_1 76-81 6 14,887 QTYQPA 310 1al3_2 265-270 6 15,485 FSHSTT 311 1g3p_1  99-105 7 17,903 GYTYINP 312 1acc_3 216-222 7 19,729 LTKYKSS 313 1ahjB_1 106-113 8 17,435 SRPSESEG 314 1acc_1 154-161 8 18,776 PELKQKSS 315 1af7_1 89-96 8 22,502 LTTNLTAF 316 1heiA_1 322-330 9 13,534 TATPPGSVT 317 1bia_2 268-276 9 16,089 LDNFINRPV 318 1igtB_1 111-119 9 19,737 VSSAKTTAP 319 1nfkA_1 239-248 10 13,228 DSKAPNASNL 320 1au7A_1 103-112 10 20,486 KRRTTISIAA 321 1bpoB_1 138-148 11 21,645 PVKMFDRHSSL 322 1b0pA_2 625-635 11 26,462 APAETKAEPMT 323 1c05A_2 135-148 14 23,819 YTRLPERSELPAEI 324 1gcb_1 57-70 14 27,39 VSTDSTPVTNQKSS 325 1bt3A_1 38-51 14 28,818 YKLPAVTTMKVRPA 326 1b3oB_2 222-236 15 20,054 IARTDLKKNRDYPLA 327 16vpA_6 312-332 21 23,713 TEEPGAPLTTPPTLHGNQARA 328 1dhx_1  81-101 21 42,703 ARFTLAVGDNRVLDMASTYFD 329 1b8aA_1  95-120 26 31,305 IVVLNRAETPLPLDPTGKVKAELDTR 330 1qu6A_1  79-106 28 51,301 ILNKEKKAVSPLLLTTTNSSEGLSMGNY 331 NFS1 — 20 — GSDITKSKISEKMKGQGPSG 332 NFS2 — 23 — GSDITKSKISEKMKGLGPDGRKA 333 CFS1 — 10 — SLTKSKISGS 334 RM2 — 32 — AAGGSALTAGALSLTAGALSLTAGALSGGGGS 335 BQY — 27 — AAGASSVSASGHIAPLSLPSSPPSVGS 336 QGPSG — 5 — QGPSG 337 LGPDGRKA — 8 — LGPDGRKA 338 TAL1 — 15 — SGGSGSNVGSGSGSG 134 TAL2 — 20 — SGGSGSLTTNLTAFSGSGSG 135 TAL3 — 22 — SGGSGSKRRTTISIAASGSGSG 136 TAL4 — 17 — SGGSGSVGRQPSGSGSG 137 TAL5 — 26 — SGGSGSYTRLPERSELPAEISGSGSG 138 TAL6 — 38 — SGGSGSIVVLNRAETPLPLDPTGKVKAELDTRSGSGSG 139 TAL7 — 21 — SGGSGSTATPPGSVTSGSGSG 140 TAL8 — 21 — SGGSGSLDNFINRPVSGSGSG 141 TAL9 — 21 — SGGSGSVSSAKTTAPSGSGSG 142 TAL10 — 22 — SGGSGSDSKAPNASNLSGSGSG 143 TAL11 — 23 — SGGSGSPVKMFDRHSSLSGSGSG 144 TAL12 — 23 — SGGSGSAPAETKAEPMTSGSGSG 145 TAL13 — 26 — SGGSGSVSTDSTPVTNQKSSSGSGSG 146 TAL14 — 16 — SGGSGSDSVISGSGSG 147 TAL15 — 33 — SGGSGSARFTLAVGDNRVLDMASTYFDSGSGSG 148 TAL16 — 17 — SGGSGSLQENLSGSGSG 149 TAL17 — 19 — SGGSGSGYTYINPSGSGSG 150 TAL18 — 26 — SGGSGSYKLPAVTTMKVRPASGSGSG 151 TAL19 — 16 — SGGSGSLEGSSGSGSG 152 TAL20 — 16 — SGGSGSIVEASGSGSG 153 TAL21 — 18 — SGGSGSQTYQPASGSGSG 154 TAL22 — 27 — SGGSGSIARTDLKKNRDYPLASGSGSG 155 TAL23 — 18 — SGGSGSLPEEKGSGSGSG 156 TAL24 — 16 — SGGSGSYTSTSGSGSG 157 TAL25 — 20 — SGGSGSSRPSESEGSGSGSG 158 TAL26 — 17 — SGGSGSLGNSLSGSGSG 159 TAL27 — 19 — SGGSGSLTKYKSSSGSGSG 160 TAL28 — 33 — SGGSGSTEEPGAPLTTPPTLHGNQARASGSGSG 161 TAL29 — 18 — SGGSGSFSHSTTSGSGSG 162 TAL30 — 20 — SGGSGSPELKQKSSSGSGSG 163 TAL31 — 40 — SGGSGSILNKEKKAVSPLLLTTTNSSEGLSMGNYSGSGSG 164 TAL32 — 31 — ELAEFHARYADLLLRDLRERPVSLVRGPDSG 165 TAL33 — 31 — ELAEFHARPDPLLLRDLRERPVSLVRGLGSG 166 TAL34 — 26 — ELAEFHARYADLLLRDLRERSGSGSG 167 TAL35 — 31 — DIFDYYAGVAEVMLGHIAGRPATRKRWPNSG 168 TAL36 — 31 — DIFDYYAGPDPVMLGHIAGRPATRKRWLGSG 169 TAL37 — 26 — DIFDYYAGVAEVMLGHIAGRSGSGSG 170 Linker A — 37 — SIVAQLSRPDPALVSFQKLKLACLGGRPALDAVKKGL 479 Linker B — 37 — SIVAQLSRPDPAAVSAQKAKAACLGGRPALDAVKKGL 480 Linker C — 37 — SIVAQLSRPDPAVVTFHKLKLACLGGRPALDAVKKGL 481 Linker D — 44 — SIVAQLSRPDPAQSLAQELSLNESQIKIACLGGRPALDAVKKGL 482 Linker E — 40 — SIVAQLSRPDPALQLPPLERLTLDACLGGRPALDAVKKGL 483 Linker F — 38 — SIVAQLSRPDPAIHKKFSSIQMACLGGRPALDAVKKGL 484 Linker G — 40 — SIVAQLSRPDPAAAAATNDHAVAAACLGGRPALDAVKKGL 485

In another embodiment, said chimeric protein according to the present invention comprises at least one protein domain or catalytic domain to process said nucleic acid target sequence. In another embodiment, the catalytic domain that is capable of processing said nucleic acid target sequence, when fused to said core scaffold according to the present invention, is fused to the N-terminus part of said core scaffold. In another preferred embodiment, said catalytic domain is fused to the C-terminus part of said core scaffold. In another embodiment two catalytic domains are fused to both N-terminus part of said core scaffold and C-terminus of said core scaffold. In the scope of the present invention are encompassed the fusion of one or several catalytic domains to said core scaffold wherein said core scaffold comprises or not an additional domain at its N-terminus and/or at its C-terminus. As previously mentioned, one or several peptidic linkers can be added for said fusions between the different domains of the chimeric protein according to the present invention. By several catalytic domains and several peptidic linkers is intended two or three or four or five as non-limiting examples.

In a preferred embodiment, said catalytic domain has an activity selected from the group consisting of nuclease activity, polymerase activity, kinase activity, phosphatase activity, methylase activity, topoisomerase activity, integrase activity, transposase activity, ligase activity, helicase activity, recombinase activity.

In another preferred embodiment, the catalytic domain fused to the core scaffold of the present invention can be a transcription activator or repressor (i.e. a transcription regulator), or a protein that interacts with or modifies other proteins implicated in DNA processing. Non-limiting examples of DNA processing activities of said chimeric protein of the present invention include, for example, creating or modifying epigenetic regulatory elements, making site-specific insertions, deletions, or repairs in DNA, controlling gene expression, and modifying chromatin structure.

In another more preferred embodiment, said catalytic domain has an endonuclease activity. In another more preferred embodiment, said protein domain has an exonuclease activity. In another more preferred embodiment, said catalytic domain is selected from the group consisting of proteins MmeI, Colicin-E7 (CEA7_ECOLX), Colicin-E9, APFL, EndA, Endo I (END1_ECOLI), Human Endo G (NUCG_HUMAN), Bovine Endo G (NUCG_BOVIN), R.HinP1I, I-BasI, I-BmoI, I-HmuI, I-TevI, I-TevII, I-TevIII, I-TwoI, R.MspI, R.MvaI, NucA, NucM, Vvn, Vvn_CLS, Staphylococcal nuclease (NUC_STAAU), Staphylococcal nuclease (NUC_STAHY), Micrococcal nuclease (NUC_SHIFL), Endonuclease yncB, Endodeoxyribonuclease I (ENRN_BPT7), Metnase, Nb.BsrDI, BsrDI A, Nt.BspD6I (R.BspD6I large subunit), ss.BspD6I (R.BspD6I small subunit), R.PleI, MlyI, AlwI, Mva12691, BsrI, BsmI, Nb.BtsCI, Nt.BtsCI, R1.BtsI, R2.BtsI, BbvCI subunit 1, BbvCI subunit 2, Bpu10I alpha subunit, Bpu10I beta subunit, BmrI, BfiI, I-CreI, hExoI (EXO1_HUMAN), Yeast ExoI (EXO1_YEAST), E. coli ExoI, Human TREX2, Mouse TREX1, Human TREX1, Bovine TREX1, Rat TREX1, Human DNA2, Yeast DNA2 (DNA2_YEAST), VP16 and RBBP8, as listed in Table 2 (SEQ ID NO: 339 to SEQ ID NO: 397 and SEQ ID NO: 598-599), a functional mutant, a variant or a derivative thereof. In another embodiment, the chimeric protein according to the present invention comprises a catalytic domain that is a polypeptide comprising an amino acid sequence having at least 80%, preferably at least 90%, more preferably at least 95% identity with any of SEQ ID NO: 339 to 397 and SEQ ID NO: 598-599. In another embodiment, said catalytic domain of the chimeric protein of the invention has an identity with I-TevI (SEQ ID NO: 349). In another embodiment, said catalytic domain has an identity with I-TevI (SEQ ID NO: 349) and is fused to the N-terminal domain of said core scaffold. In another embodiment, said catalytic domain has an identity with I-TevI (SEQ ID NO: 349) and is fused to the C-terminal domain of said core scaffold. In another embodiment, said chimeric protein according to the present invention acts as a monomer wherein two of said monomer respectively bind one nucleic acid sequence adjacent to a nucleic acid target sequence thereby together processing said nucleic target sequence. In another embodiment, said chimeric protein comprises a catalytic domain having identity with I-TevI (SEQ ID NO: 349) and fused to the C-terminal domain of said core scaffold and acts as a monomer wherein two of said monomer respectively bind one nucleic acid sequence adjacent to a nucleic acid target sequence thereby together processing said nucleic target sequence. In this last case, the first and the second monomers have the same amino acid sequence or not. In another embodiment, said chimeric protein comprises a catalytic domain having identity with I-TevI (SEQ ID NO: 349) and fused to the C-terminal domain of said core scaffold acting as a first monomer binding one nucleic acid sequence adjacent to a nucleic acid target sequence and wherein a second chimeric protein monomer comprising I-TevI (SEQ ID NO: 349) or derivatives thereof fused to the N-terminus of said core scaffold acts as a second monomer binding another nucleic acid sequence adjacent to said nucleic acid target sequence, thereby together processing said nucleic target sequence.

In another embodiment, said catalytic domain of the chimeric protein of the invention has an identity with NucA (SEQ ID NO: 355). In another embodiment, said catalytic domain has an identity with NucA (SEQ ID NO: 355) and is fused to the N-terminal domain of said core scaffold. In another embodiment, said catalytic domain has an identity with NucA (SEQ ID NO: 355) and is fused to the C-terminal domain of said core scaffold. In another embodiment, said catalytic domain of the chimeric protein of the invention has an identity with ColE7 (SEQ ID NO: 340). In another embodiment, said catalytic domain has an identity with ColE7 (SEQ ID NO: 340) and is fused to the N-terminal domain of said core scaffold. In another embodiment, said catalytic domain has an identity with ColE7 (SEQ ID NO: 340) and is fused to the C-terminal domain of said core scaffold.

In another embodiment, said catalytic domain of the chimeric protein of the invention has an identity with FokI (SEQ ID NO: 600) and is fused to the N-terminal domain of said core scaffold. In another embodiment, said additional catalytic domain at the N-terminus of said core scaffold comprises an amino acid sequence having at least 80%, preferably at least 90%, more preferably at least 95% identity with Fok-I (SEQ ID NO: 600).

In another more preferred embodiment, any combinations of two catalytic domains selected from the group consisting of proteins MmeI, Colicin-E7 (CEA7_ECOLX), Colicin-E9, APFL, EndA, Endo I (END1_ECOLI), Human Endo G (NUCG_HUMAN), Bovine Endo G (NUCG_BOVIN), R.HinP1I, I-BasI, I-BmoI, I-HmuI, I-TevI, I-TevII, I-TevIII, I-TwoI, R.MspI, R.MvaI, NucA, NucM, Vvn, Vvn_CLS, Staphylococcal nuclease (NUC_STAAU), Staphylococcal nuclease (NUC_STAHY), Micrococcal nuclease (NUC_SHIFL), Endonuclease yncB, Endodeoxyribonuclease I (ENRN_BPT7), Metnase, Nb.BsrDI, BsrDI A, Nt.BspD6I (R.BspD6I large subunit), ss.BspD6I (R.BspD6I small subunit), R.PleI, MlyI, AlwI, Mva1269I, BsrI, BsmI, Nb.BtsCI, Nt.BtsCI, R1.BtsI, R2.BtsI, BbvCI subunit 1, BbvCI subunit 2, Bpu10I alpha subunit, Bpu10I beta subunit, BmrI, BfiI, I-CreI, hExoI (EXO1_HUMAN), Yeast ExoI (EXO1_YEAST), E. coli ExoI, Human TREX2, Mouse TREX1, Human TREX1, Bovine TREX1, Rat TREX1, Human DNA2, Yeast DNA2 (DNA2_YEAST), VP16 and RBBP8 as listed in Table 2 (SEQ ID NO: 339 to SEQ ID NO: 397 and SEQ ID NO: 598-599), a functional mutant, a variant or a derivative of these protein domains thereof, and can be fused either to the N-terminus or to the C-terminus or to both N-terminus part and C-terminus part of said core scaffold according to the present invention, respectively. For example, I-HmuI protein domain can be fused to the N-terminus part of said core scaffold and ColE7 protein domain can be fused to the C-terminus part of said core scaffold. In another example, I-TevI protein domain can be fused to the N-terminus part of said core scaffold and ColE7 protein domain can be fused to the C-terminus part of said core scaffold. In the scope of the present invention, it can be envisioned to fuse one protein domain between two core scaffolds according to the invention, each one comprising at least one set of RVDs. In this last case, the number of RVDs for each engineered core TALE scaffold can be the same or not.

In another embodiment, the chimeric protein according to the present invention comprises, catalytic domains respectively fused to the C-terminus and to the N-terminus of said core scaffold and selected from those having identity with:

-   -   (i) NucA domain (SEQ ID NO: 355) in N-terminus and NucA domain         (SEQ ID NO: 355) in C-terminus;     -   (ii) ColE7 domain (SEQ ID NO: 340) in N-terminus and ColE7         domain (SEQ ID NO: 340) in C-terminus;     -   (iii) NucA domain (SEQ ID NO: 355) in N-terminus and ColE7         domain (SEQ ID NO: 340) in C-terminus;     -   (iv) ColE7 domain (SEQ ID NO: 340) in N-terminus and NucA domain         (SEQ ID NO: 355) in C-terminus;     -   (v) NucA domain (SEQ ID NO: 355) in N-terminus and I-TevI domain         (SEQ ID NO: 349) in C-terminus;     -   (vi) ColE7 domain (SEQ ID NO: 340) in N-terminus and I-TevI         domain (SEQ ID NO: 349) in C-terminus;     -   (vii) FokI domain (SEQ ID NO: 600) in N-terminus and ColE7         domain (SEQ ID NO: 340) in C-terminus;     -   (viii) FokI domain (SEQ ID NO: 600) in N-terminus and NucA         domain (SEQ ID NO: 355) in C-terminus.

In another embodiment, the chimeric protein according to the present invention comprises, catalytic domains respectively fused to the C-terminus and to the N-terminus of said core scaffold and selected from those having identity with:

-   -   (i) NucA domain (SEQ ID NO: 355) in N-terminus and I-TevI domain         (SEQ ID NO: 349) in C-terminus;     -   (ii) ColE7 domain (SEQ ID NO: 340) in N-terminus and I-TevI         domain (SEQ ID NO: 349) in C-terminus;     -   (iii) FokI domain (SEQ ID NO: 600) in N-terminus and ColE7         domain (SEQ ID NO: 340) in C-terminus;     -   (iv) Fold domain (SEQ ID NO: 600) in N-terminus and NucA domain         (SEQ ID NO: 355) in C-terminus,     -   said chimeric protein acting as a monomer wherein two of said         monomer respectively bind one nucleic acid sequence adjacent to         a nucleic acid target sequence thereby together processing said         nucleic target sequence.

In another embodiment, said catalytic domain of the chimeric protein of the invention has an identity with FokI (SEQ ID NO: 600) which is fused to the N-terminal domain of said core scaffold, wherein said chimeric protein acts as a monomer and wherein a second monomer binds another nucleic acid sequence adjacent to a nucleic acid target sequence thereby together processing said nucleic acid target sequence. In this case, the first and the second monomers have the same amino acid sequence or not.

In another embodiment, said catalytic domain of the chimeric protein of the invention has an identity with FokI (SEQ ID NO: 600) which is fused to the N-terminal domain of said core scaffold, wherein said chimeric protein acts as a monomer and wherein a second monomer, comprising a catalytic domain having an identity with FokI (SEQ ID NO: 600) which is fused to the C-terminal domain of said core scaffold, binds another nucleic acid sequence adjacent to a nucleic acid target sequence thereby together processing said nucleic acid target sequence.

TABLE 2 List of protein domains for chimeric proteins. GENBANK/ SEQ SWISS-PROT ID ID NAME NO FASTA SEQUENCE ACC85607.1 MmeI 339 >gi|186469979|gb|ACC85607.1| MmeI [Methylophilus methylotrophus] MALSWNEIRRKAIEFSKRWEDASDENSQAKPFLIDFFEVFGITNKRVATFEHAVKKFAKAHKEQSRGFVD LFWPGILLIEMKSRGKDLDKAYDQALDYFSGIAERDLPRYVLVCDFQRFRLTDLITKESVEFLLKDLYQN VRSFGFIAGYQTQVIKPQDPINIKAAERMGKLHDTLKLVGYEGHALELYLVRLLFCLFAEDTTIFEKSLF QEYIETKTLEDGSDLAHHINTLFYVLNTPEQKRLKNLDEHLAAFPYINGKLFEEPLPPAQFDKAMREALL DLCSLDWSRISPAIFGSLFQSIMDAKKRRNLGAHYTSEANILKLIKPLFLDELWVEFEINKNNKNKLLAF HKKLRGLTFFDPACGCGNFLVITYRELRLLEIEVLRGLHRGGQQVLDIEHLIQINVDQFFGIEIEEFPAQ IAQVALWLTDHQMNMKISDEFGNYFARIPLKSTPHILNANALQIDWNDVLEAKKCCFILGNPPFVGKSKQ TPGQKADLLSVFGNLKSASDLDLVAAWYPKAAHYIQTNANIRCAFVSTNSITQGEQVSLLWPLLLSLGIK INFAHRTFSWTNEASGVAAVHCVIIGFGLKDSDEKIIYEYESINGEPLAIKAKNINPYLRDGVDVIACKR QQPISKLPSMRYGNKPTDDGNFLFTDEEKNQFITNEPSSEKYFRRFVGGDEFINNTSRWCLWLDGADISE IRAMPLVLARIKKVQEFRLKSSAKPTRQSASTPMKFFYISQPDTDYLLIPETSSENRQFIPIGFVDRNVI SSNATYHIPSAEPLIFGLLSSTMHNCWMRNVGGRLESRYRYSASLVYNTFPWIQPNEKQSKAIEEAAFAI LKARSNYPNESLAGLYDPKTMPSELLKAHQKLDKAVDSVYGFKGPNTEIARIAFLFETYQKMTSLLPPEK EIKKSKGKN Q47112.2 Colicin-E7 340 >gi|12644448|sp|Q47112.2|CEA7_ECOLX RecName: Full = Colicin-E7 (CEA7_ECOLX) MSGGDGRGHNSGAHNTGGNINGGPTGLGGNGGASDGSGWSSENNPWGGGSGSGVHWGGGSGHGNGGG NSN SGGGSNSSVAAPMAFGFPALAAPGAGTLGISVSGEALSAAIADIFAALKGPFKFSAWGIALYGILPSEIA KDDPNMMSKIVTSLPAETVTNVQVSTLPLDQATVSVTKRVTDVVKDTRQHIAVVAGVPMSVPVVNAKPTR TPGVFHASFPGVPSLIVSTVKGLPVSTTLPRGITEDKGRTAVPAGFTFGGGSHEAVIRFPKESGQKPVYV SVTDVLTPAQVKQRQDEEKRLQQEWNDAHPVEVAERNYEQARAELNQANKDVARNQERQAKAVQVYNSRK SELDAANKTLADAKAEIKQFERFAREPMAAGHRMWQMAGLKAQRAQTDVNNKKAAFDAAAKEKSDADVAL SSALERRKQKENKEKDAKAKLDKESKRNKPGKATGKGKPVNNKWLNNAGKDLGSPVPDRIANKLRDKEFK SFDDFRKKFWEEVSKDPELSKQFSRNNNDRMKVGKAPKTRTQDVSGKRTSFELHHEKPISQNGGVYDMDN ISVVTPKRHIDIHRGK CAA38134.1 EndA 341 >gi|47374|emb|CAA38134.1| EndA [Streptococcus pneumoniae] MNKKTRQTLIGLLVLLLLSTGSYYIKQMPSAPNSPKTNLSQKKQASEAPSQALAESVLTDAVKSQIKGSL EWNGSGAFIVNGNKTNLDAKVSSKPYADNKTKTVGKETVPTVANALLSKATRQYKNRKETGNGSTSWTPP GWHQVKNLKGSYTHAVDRGHLLGYALIGGLDGFDASTSNPKNIAVQTAWANQAQAEYSTGQNYYESKVRK ALDQNKRVRYRVTLYYASNEDLVPSASQIEAKSSDGELEFNVLVPNVQKGLQLDYRTGEVTVTQ P25736.1 Endo I 342 >gi|119325|sp|P25736.1|END1_ECOLI RecName: Full = Endonuclease-1; (END1_ECOLI) AltName: Full = Endonuclease I; Short = Endo I; Flags: Precursor MYRYLSIAAVVLSAAFSGPALAEGINSFSQAKAAAVKVHADAPGTFYCGCKINWQGKKGVVDLQSCGYQV RKNENRASRVEWEHVVPAWQFGHQRQCWQDGGRKNCAKDPVYRKMESDMHNLQPSVGEVNGDRGNFMY SQ WNGGEGQYGQCAMKVDFKEKAAEPPARARGAIARTYFYMRDQYNLTLSRQQTQLFNAWNKMYPVTDWECE RDERIAKVQGNHNPYVQRACQARKS Q14249.4 Human Endo G 343 >gi|317373579|sp|Q14249.4|NUCG_HUMAN RecName: Full = Endonuclease G, (NUCG_HUMAN) mitochondrial; Short = Endo G; Flags: Precursor MRALRAGLTLASGAGLGAVVEGWRRRREDARAAPGLLGRLPVLPVAAAAELPPVPGGPRGPGELAKYGLP GLAQLKSRESYVICYDPRTRGALWVVEQLRPERLRGDGDRRECDFREDDSVHAYHRATNADYRGSGFDRG HLAAAANHRWSQKAMDDTFYLSNVAPQVPHLNQNAWNNLEKYSRSLTRSYQNVYVCTGPLFLPRTEADGK SYVKYQVIGKNHVAVPTHFFKVLILEAAGGQIELRTYVMPNAPVDEAIPLERFLVPIESIERASGLLFVP NILARAGSLKAITAGSK P38447.1 Bovine Endo G 344 >gi|585596|sp|P38447.1|NUCG_BOVIN RecName: Full = Endonuclease G, (NUCG_BOVIN) mitochondrial; Short = Endo G; Flags: Precursor MQLLRAGLTLALGAGLGAAAESWWRQRADARATPGLLSRLPVLPVAAAAGLPAVPGAPAGGGPGELAKYG LPGVAQLKSRASYVLCYDPRTRGALWVVEQLRPEGLRGDGNRSSCDFHEDDSVHAYHRATNADYRGSGFD RGHLAAAANHRWSQKAMDDTFYLSNVAPQVPHLNQNAWNNLEKYSRSLTRTYQNVYVCTGPLFLPRTEAD GKSYVKYQVIGKNHVAVPTHFFKVLILEAAGGQIELRSYVMPNAPVDEAIPLEHFLVPIESIERASGLLF VPNILARAGSLKAITAGSK AAW33811.1 R.HinP1I 345 >gi|57116674|gb|AAW33811.1| R.HinP1I restriction endonuclease [Haemophilus influenzae] MNLVELGSKTAKDGFKNEKDIADRFENWKENSEAQDWLVTMGHNLDEIKSVKAVVLSGYKSDINVQVLVF YKDALDIHNIQVKLVSNKRGFNQIDKHWLAHYQEMWKFDDNLLRILRHFTGELPPYHSNTKDKRRMFMTE FSQEEQNIVLNWLEKNRVLVLTDILRGRGDFAAEWVLVAQKVSNNARWILRNINEVLQHYGSGDISLSPR GSINFGRVTIQRKGGDNGRETANMLQFKIDPTELFDI AAO93095.1 I-BasI 346 >gi|29838473|gb|AAO93095.1| I-BasI [Bacillus phage Bastille] MFQEEWKDVTGFEDYYEVSNKGRVASKRTGVIMAQYKINSGYLCIKFTVNKKRTSHLVHRLVAREFCEGY SPELDVNHKDTDRMNNNYDNLEWLTRADNLKDVRERGKLNTHTAREALAKVSKKAVDVYTKDGSEYIATY PSATEAAEALGVQGAKISTVCHGKRQHTGGYHFKFNSSVDPNRSVSKK AAK09365.1 I-BmoI 347 >gi|12958590|gb|AAK09365.1|AF321518_2 intron encoded I-BmoI [Bacillus mojavensis] MKSGVYKITNKNIGKFYIGSSEDCESRLKVHFRNLKNNRHINRYLNNSFNKHGEQVFIGEVIHILPIEEA IAKEQWYIDNFYEEMYNISKSAYHGGDLTSYHPDKRNIILKRADSLKKVYLKMTSEEKAKRWQCVQGENN PMFGRKHTETTKLKISNHNKLYYSTHKNPFKGKKHSEESKTKLSEYASQRVGEKNPFYGKTHSDEFKTYM SKKFKGRKPKNSRPVIIDGTEYESATEASRQLNVVPATILHRIKSKNEKYSGYFYK P34081.1 I-HmuI 348 >gi|465641|sp|P34081.1|HMUI_BPSP1 RecName: Full = DNA endonuclease I-HmuI; AltName: Full = HNH homing endonuclease I-HmuI MEWKDIKGYEGHYQVSNTGEVYSIKSGKTLKHQIPKDGYHRIGLFKGGKGKTFQVHRLVAIHFCEGYEEG LVVDHKDGNKDNNLSTNLRWVTQKINVENQMSRGTLNVSKAQQIAKIKNQKPIIVISPDGIEKEYPSTKC ACEELGLTRGKVTDVLKGHRIHHKGYTFRYKLNG P13299.2 I-TevI 349 >gi|6094464|sp|P13299.2|TEV1_BPT4 RecName: Full = Intron-associated endonuclease 1; AltName: Full = I-TevI; AltName: Full = IRF protein MKSGIYQIKNTLNNKVYVGSAKDFEKRWKRHFKDLEKGCHSSIKLQRSFNKHGNVFECSILEEIPYEKDL IIERENFWIKELNSKINGYNIADATFGDTCSTHPLKEEIIKKRSETVKAKMLKLGPDGRKALYSKPGSKN GRWNPETHKFCKCGVRIQTSAYTCSKCRNRSGENNSFFNHKHSDITKSKISEKMKGKKPSNIKKISCDGV IFDCAADAARHFKISSGLVTYRVKSDKWNWFYINA P07072.2 I-TevII 350 >gi|20141823|sp|P07072.2|TEV2_BPT4 RecName: Full = Intron-associated endonuclease 2; AltName: Full = I-TevII MKWKLRKSLKIANSVAFTYMVRFPDKSFYIGFKKFKTIYGKDTNWKEYNSSSKLVKEKLKDYKAKWIILQ VFDSYESALKHEEMLIRKYFNNEFILNKSIGGYKFNKYPDSEEHKQKLSNAHKGKILSLKHKDKIREKLI EHYKNNSRSEAHVKNNIGSRTAKKTVSIALKSGNKFRSFKSAAKFLKCSEEQVSNHPNVIDIKITIHPVP EYVKINDNIYKSFVDAAKDLKLHPSRIKDLCLDDNYPNYIVSYKRVEK Q38419.1 I-TevIII 351 >gi|11387192|sp|Q38419.1|TEV3_BPR03 RecName: Full = Intron-associated endonuclease 3; AltName: Full = I-TevIII MNYRKIWIDANGPIPKDSDGRTDEIHHKDGNRENNDLDNLMCLSIQEHYDIHLAQKDYQACHAIKLRMKY SPEEISELASKAAKSREIQIFNIPEVRAKNIASIKSKIENGTFHLLDGEIQRKSNLNRVALGIHNFQQAE HIAKVKERNIAAIKEGTHVFCGGKMQSETQSKRVNDGSHHFLSEDHKKRTSAKTLEMVKNGTHPAQKEIT CDFCGHIGKGPGFYLKHNDRCKLNPNRIQLNCPYCDKKDLSPSTYKRWHGDNCKARFND AAM00817.1 I-TwoI 352 >gi|19881200|gb|AAM00817.1|AF485080_2 HNH endonuclease I-TwoI [Staphylococcus phage Twort] MEELWKEIPGFNSYMISNKGQVYSRKRNKILALRTDKNGYKRISIFNNEGKRILLGVHKLVLLGFKGINT EKPIPHHKNNIKDDNRLENLEWVIVSENTKHAYDIGALKSPRRVTCTLYYKGEPLSCYDSLFDLAKALKV SRSVIESPRNGLVLSTFEVKREPTIQGLPLNKEIFEHSLIKGLGNPPLKVYNEDETYYFLTLMDISKYFN ESYSKVQRGYYKGKWKSYIIEHIDFYEYYKQTH P11405.1 R.MspI 353 >gi|135239|sp|P11405.1|T2M1_MORSP RecName: Full = Type-2 restriction enzyme MspI; Short = R.MspI; AltName: Full = Endonuclease MspI; AltName: Full = Type II restriction enzyme MspI MRTELLSKLYDDFGIDQLPHTQHGVTSDRLGKLYEKYILDIFKDIESLKKYNTNAFPQEKDISSKLLKAL NLDLDNIIDVSSSDTDLGRTIAGGSPKTDATIRFTFHNQSSRLVPLNIKHSSKKKVSIAEYDVETICTGV GISDGELKELIRKHQNDQSAKLFTPVQKQRLTELLEPYRERFIRWCVTLRAEKSEGNILHPDLLIRFQVI DREYVDVTIKNIDDYVSDRIAEGSKARKPGFGTGLNWTYASGSKAKKMQFKG R.MvaI R.MvaI 354 >gi|119392963|gb|AAM03024.2|AF472612_1 R.MvaI [Kocuria varians] MSEYLNLLKEAIQNVVDGGWHETKRKGNTGIGKTFEDLLEKEEDNLDAPDFHDIEIKTHETAAKSLLTLF TKSPTNPRGANTMLRNRYGKKDEYGNNILHQTVSGNRKTNSNSYNYDFKIDIDWESQVVRLEVFDKQDIM IDNSVYWSFDSLQNQLDKKLKYIAVISAESKIENEKKYYKYNSANLFTDLTVQSLCRGIENGDIKVDIRI GAYHSGKKKGKTHDHGTAFRINMEKLLEYGEVKVIV CAA45962.1 NucA 355 >gi|39041|emb|CAA45962.1| NucA [Nostoc sp. PCC 7120] MGICGKLGVAALVALIVGCSPVQSQVPPLTELSPSISVHLLLGNPSGATPTKLTPDNYLMVKNQYALSYN NSKGTANWVAWQLNSSWLGNAERQDNFRPDKTLPAGWVRVTPSMYSGSGYDRGHIAPSADRTKTTEDNAA TFLMTNMMPQTPDNNRNTWGNLEDYCRELVSQGKELYIVAGPNGSLGKPLKGKVIVPKSTWKIVVVLDSP GSGLEGITANTRVIAVNIPNDPELNNDWRAYKVSVDELESLTGYDFLSNVSPNIQTSIESKVDN P37994.2 NucM 356 >gi|313104150|sp|P37994.2|NUCM_DICD3 RecName: Full = Nuclease nucM; Flags: Precursor MLRNLVIFAVLGAGLTTLAAAGQDINNFTQAKAAAAKIHQDAPGTFYCGCKINWQGKKGTPDIASCGYQV RKDANRASRIEWEHVVPAWQFGHQRQCWQDGGRKNCTKDDVYRQIETDLHNLQPAIGEVNGDRGNFMYSQ WNGGERQYGQCEMKIDFKSQLAEPPERARGAIARTYFYMRDRYNLNLSRQQTQLFDAWNKQYPATTWECT REKRIAAVQGNHNPYVQQACQP AAF19759.1 Vvn 357 >gi|6635279|gb|AAF19759.1|AF063303_1 nuclease precursor Vvn [Vibrio vulnificus] MKRLFIFIASFTAFAIQAAPPSSFSAAKQQAVKIYQDHPISFYCGCDIEWQGKKGIPNLETCGYQVRKQQ TRASRIEWEHVVPAWQFGHHRQCWQKGGRKNCSKNDQQFRLMEADLHNLTPAIGEVNGDRSNFNFSQWNG VDGVSYGRCEMQVNFKQRKVMPQTELRGSIARTYLYMSQEYGFQLSKQQQQLMQAWNKSYPVDEWECTRD DRIAKIQGNHNPFVQQSCQTQ AAF19759.1 Vvn_CLS 358 >Vvn_CLS (variant of AAF19759.1) (reference) MASGAPPSSFSAAKQQAVKIYQDHPISFYCGCDIEWQGKKGIPNLETCGYQVRKQQTRASRIEWEHVVPA WQFGHHRQCWQKGGRKNCSKNDQQFRLMEADLHNLTPAIGEVNGDRSNFNFSQWNGVDGVSYGRCEMQV N FKQRKVMPPDRARGSIARTYLYMSQEYGFQLSKQQQQLMQAWNKSYPVDEWECTRDDRIAKIQGNHNPFV QQSCQTQGSSAD P00644.1 Staphylococcal 359 >gi|128852|sp|P00644.1|NUC_STAAU RecName: Full = Thermonuclease; nuclease Short = TNase; AltName: Full = Micrococcal nuclease; AltName: Full = (NUC_STAAU) Staphylococcal nuclease; Contains: RecName: Full = Nuclease B; Contains: RecName: Full = Nuclease A; Flags: Precursor MLVMTEYLLSAGICMAIVSILLIGMAISNVSKGQYAKRFFFFATSCLVLTLVVVSSLSSSANASQTDNGV NRSGSEDPTVYSATSTKKLHKEPATLIKAIDGDTVKLMYKGQPMTFRLLLVDTPETKHPKKGVEKYGPEA SAFTKKMVENAKKIEVEFDKGQRTDKYGRGLAYIYADGKMVNEALVRQGLAKVAYVYKPNNTHEQHLRKS EAQAKKEKLNIWSEDNADSGQ P43270.1 Staphylococcal 360 >gi|1171859|sp|P43270.1|NUC_STAHY RecName: Full = Thermonuclease;  nuclease Short = TNase; AltName: Full = Micrococcal nuclease; AltName: Full = (NUC_STAHY) Staphylococcal nuclease; Flags: Precursor MKKITTGLIIVVAAIIVLSIQFMTESGPFKSAGLSNANEQTYKVIRVIDGDTIIVDKDGKQQNLRMIGVD TPETVKPNTPVQPYGKEASDFTKRHLTNQKVRLEYDKQEKDRYGRTLAYVWLGKEMFNEKLAKEGLARAK FYRPNYKYQERIEQAQKQAQKLKKNIWSN P29769.1 Micrococcal 361 >gi|266681|sp|P29769.1|NUC_SHIFL RecName: Full = Micrococcal nuclease; nuclease Flags: Precursor (NUC_SHIFL) MKSALAALRAVAAAVVLIVSVPAWADFRGEVVRILDGDTIDVLVNRQTIRVRLADIDAPESGQAFGSRAR QRLADLTFRQEVQVTEKEVDRYGRTLGVVYAPLQYPGGQTQLTNINAIMVQEGMAWAYRYYGKPTDAQMY EYEKEARRQRLGLWSDPNAQEPWKWRRASKNATN P94492.1 Endonuclease 362 >gi|81345826|sp||YNCB_BACSU RecName: Full = Endonuclease yncB; Flags: yncB Precursor MKKILISMIAIVLSITLAACGSNHAAKNHSDSNGTEQVSQDTHSNEYNQTEQKAGTPHSKNQKKLVNVTL DRAIDGDTIKVIYNGKKDTVRYLLVDTPETKKPNSCVQPYGEDASKRNKELVNSGKLQLEFDKGDRRDKY GRLLAYVYVDGKSVQETLLKEGLARVAYVYEPNTKYIDQFRLDEQEAKSDKLSIWSKSGYVTNRGFNGCV K P00641.1 Endodeoxyribo- 363 >gi|119370|sp|P00641.1|ENRN_BPT7 RecName: Full = Endodeoxyribonuclease nuclease I 1; AltName: Full = Endodeoxyribonuclease I; Short = Endonuclease (ENRN_BPT7) MAGYGAKGIRKVGAFRSGLEDKVSKQLESKGIKFEYEEWINPYVIPASNHTYTPDFLLPNGIFVETKGLW ESDDRKKHLLIREQHPELDIRIVFSSSRTKLYKGSPTSYGEFCEKHGIKFADKLIPAEWIKEPKKEVPFD RLKRKGGKK Q53H47.1 Metnase 364 >gi|74740552|sp|Q53H47.1|SETMR_HUMAN RecName: Full = Histone-lysine N-methyltransferase SETMAR; AltName: Full = SET domain and mariner transposase fusion gene-containing protein; Short = HsMar1; Short = Metnase; Includes: RecName: Full = Histone-lysine N-methyltransferase; Includes: RecName: Full = Mariner transposase Hsmar1 MAEFKEKPEAPTEQLDVACGQENLPVGAWPPGAAPAPFQYTPDHVVGPGADIDPTQITFPGCICVKTPCL PGTCSCLRHGENYDDNSCLRDIGSGGKYAEPVFECNVLCRCSDHCRNRVVQKGLQFHFQVFKTHKKGWGL RTLEFIPKGRFVCEYAGEVLGFSEVQRRIHLQTKSDSNYIIAIREHVYNGQVMETFVDPTYIGNIGRFLN HSCEPNLLMIPVRIDSMVPKLALFAAKDIVPEEELSYDYSGRYLNLTVSEDKERLDHGKLRKPCYCGAKS CTAFLPFDSSLYCPVEKSNISCGNEKEPSMCGSAPSVFPSCKRLTLETMKMMLDKKQIRAIFLFEFKMGR KAAETTRNINNAFGPGTANERTVQWWFKKFCKGDESLEDEERSGRPSEVDNDQLRAIIEADPLTTTREVA EELNVNHSTVVRHLKQIGKVKKLDKWVPHELTENQKNRRFEVSSSLILRNHNEPFLDRIVTCDEKWILYD NRRRSAQWLDQEEAPKHFPKPILHPKKVMVTIWWSAAGLIHYSFLNPGETITSEKYAQEIDEMNQKLQRL QLALVNRKGPILLHDNARPHVAQPTLQKLNELGYEVLPHPPYSPDLLPTNYHVFKHLNNFLQGKRFHNQQ DAENAFQEFVESQSTDFYATGINQLISRWQKCVDCNGSYFD ABD15132.1 Nb.BsrDI 365 >gi|86757493|gb|ABD15132.1| Nb.BsrDI [Geobacillus stearothermophilus] MTEYDLHLYADSFHEGHWCCENLAKIAQSDGGKHQIDYLQGFIPRHSLIFSDLIINITVFGSYKSWKHLP KQIKDLLFWGKPDFIAYDPKNDKILFAVEETGAVPTGNQALQRCERIYGSARKQIPFWYLLSEFGQHKDG GTRRDSIWPTIMGLKLTQLVKTPSIILHYSDINNPEDYNSGNGLKFLFKSLLQIIINYCTLKNPLKGMLE LLSIQYENMLEFIKSQWKEQIDFLPGEEILNTKTKELARMYASLAIGQTVKIPEELFNWPRIDKVNFKSP QGLIKYDELCYQLEKAVGSKKAYCLSNNAGAKPQKLESLKEWINSQKKLFDKAPKLTPPAEFNMKLDAFP VTSNNNYYVTTSKNILYLFDYWKDLRIAIETAFPRLKGKLPTDIDEKPALIYICNSVKPGRLFGDPFTGQ LSAFSTIFGKKNIDMPRIVVAYYPHQIYSQALPKNNKSNKGITLKKELTDFLIFHGGVVVKLNEGKAY ABD15133.1 BsrDI A 366 >gi|86757494|gb|ABD15133.1| BsrDI A [Geobacillus stearothermophilus] MTDYRYSFELSEEIARWAFEIKTKNTDWFVAFSNPTAGPWKRVMAIDKASNREGEVHRFGREDERPDIIL VNDNISLILILEAKEKLNQLISKSQVDKSVDVFLTLSSILKEKSDNNYWGDRTKYINVLGILWGSEQETS QKDIDNAFRVYRDSLVKNLKEINPTPTNICTDILVGVESIKNKKEEISIKIHVSNIYAEIYPKFTGKHLL EKLAVLN ABN42182.1 Nt.BspD6I 367 >gi|125396996|gb|ABN42182.1| heterodimeric restriction endonuclease (R.BspD6I R.BspD6I large subunit [Bacillus sp. D6] large subunit) MAKKVNWYVSCSPRSPEKIQPELKVLANFEGSYWKGVKGYKAQEAFAKELAALPQFLGTTYKKEAAFSTR DRVAPMKTYGFVFVDEEGYLRITEAGKMLANNRRPKDVFLKQLVKWQYPSFQHKGKEYPEEEWSINPLVF VLSLLKKVGGLSKLDIAMFCLTATNNNQVDEIAEEIMQFRNEREKIKGQNKKLEFTENYFFKRFEKIYGN VGKIREGKSDSSHKSKIETKMRNARDVADATTRYFRYTGLFVARGNQLVLNPEKSDLIDEIISSSKVVKN YTRVEEFHEYYGNPSLPQFSFETKEQLLDLAHRIRDENTRLAEQLVEHFPNVKVEIQVLEDIYNSLNKKV DVETLKDVIYHAKELQLELKKKKLQADFNDPRQLEEVIDLLEVYHEKKNVIEEKIKARFIANKNTVFEWL TWNGFIILGNALEYKNNFVIDEELQPVTHAAGNQPDMEIIYEDFIVLGEVTTSKGATQFKMESEPVTRHY LNKKKELEKQGVEKELYCLFIAPEINKNTFEEFMKYNIVQNTRIIPLSLKQFNMLLMVQKKLIEKGRRLS SYDIKNLMVSLYRTTIECERKYTQIKAGLEETLNNWVVDKEVRF ABN42183.1 ss.BspD6I 368 >gi|125396997|gb|ABN42183.1| heterodimeric restriction endonuclease (R.BspD6I R.BspD6I small subunit [Bacillus sp. D6] small subunit) MQDILDFYEEVEKTINPPNYFEWNTYRVFKKLGSYKNLVPNFKLDDSGHPIGNAIPGVEDILVEYEHFSI LIECSLTIGEKQLDYEGDSVVRHLQEYKKKGIEAYTLFLGKSIDLSFARHIGFNKESEPVIPLTVDQFKK LVTQLKGDGEHFNPNKLKEILIKLLRSDLGYDQAEEWLTFIEYNLK AAK27215.1 R.PleI 369 >gi|13448813|gb|AAK27215.1|AF355461_2 restriction endonuclease R.PleI [Paucimonas lemoignei] MAKPIDSKVLFITTSPRTPEKMVPEIELLDKNFNGDVWNKDTQTAFMKILKEESFFDGEGKNDPAFSARD RINRAPKSLGFVILTPKLSLTDAGVELIKAKRKDDIFLRQMLKFQLPSPYHKLSDKAALFYVKPYLEIFR LVRHFGSLTFDELMIFGLQIIDFRIFNQIVDKIEDFRVGKIENKGRYKTYKKERFEEELGKIYKDELFGL TEASAKTLITKKGNNMRDYADACVRYLRATGMVNVSYQGKSLSIVQEKKEEVDFFLKNTEREPCFINDEA SYVSYLGNPNYPKLFVDDVDRIKKKLRFDFKKTNKVNALTLPELKEELENEILSRKENILKSQISDIKNF KLYEDIQEVFEKIENDRTLSDAPLMLEWNTWRAMTMLDGGEIKANLKFDDFGSPMSTAIGNMPDIVCEYD DFQLSVEVTMASGQKQYEMEGEPVSRHLGKLKKSSEKPVYCLFIAPKINPSSVAHFFMSHKVDIEYYGGK SLIIPLELSVFRKMIEDTFKASYIPKSDNVHKLFKNFASIADEAGNEKVWYEGVKRTAMNWLSLS AAK39546.1 MlyI 370 >gi|13786046|gb|AAK39546.1|AF355462_2 MlyIR [Micrococcus lylae] MASLSKTKHLFGFTSPRTIEKIIPELDILSQQFSGKVWGENQINFFDAIFNSDFYEGTTYPQDPALAARD RITRAPKALGFIQLKPVIQLTKAGNQLVNQKRLPELFTKQLLKFQLPSPYHTQSPTVNFNVRPYLELLRL INELGSISKTEIALFFLQLVNYNKFDEIKNKILKFRETRKNNRSVSWKTYVSQEFEKQISIIFADEVTAK NFRTRESSDESFKKFVKTKEGNMKDYADAFFRYIRGTQLVTIDKNLHLKISSLKQDSVDFLLKNTDRNAL NLSLMEYENYLFDPDQLIVLEDNSGLINSKIKQLDDSINVESLKIDDAKDLLNDLEIQRKAKTIEDTVNH LKLRSDIEDILDVFAKIKKRDVPDVPLFLEWNIWRAFAALNHTQAIEGNFIVDLDGMPLNTAPGKKPDIE INYGSFSCIVEVTMSSGETQFNMEGSSVPRHYGDLVRKVDHDAYCIFIAPKVAPGTKAHFFNLNRLSTKH YGGKTKIIPMSLDDFICFLQVGITHNFQDINKLKNWLDNLINFNLESEDEEIWFEEIISKISTWAI YP_004134094.1 AlwI 371 >gi|319768594|ref|YP_004134094.1| restriction endonuclease, type II, AlwI [Geobacillus sp. Y412MC52] MNKKNTRKVWFITRPERDPRFHQEALLALQKATDDFRLKWAGNREVHKRYEEELANMGIKRNNVSHDGSG GRTWMAMLKTFSYCYVDDDGYIRLTKVGEKLIQGEKVYENTRKQVLTLQYPNAYFLEPGFRPKFDEGFRI RPVLFLIKLANDERLDFYVTKEEITYFAMTAQKDSQLDEIVHKILAFRKAGPREREEMKQDIAAKFDHRE RSDKGARDFYEAHSDVAHTFMLISDYTGLVEYIRGKALKGDSSKINEIKQEIAEIEKRYPFNTRYMISLE RMAENSGLDVDSYKASRYGNIKPAANSSKLRAKAERILAQFPSIESMSKEEIAGALQKYLSPRDIEKVIH EIVENKDDFEGINSDFVETYLNEKDNLAFEDKTGQIFSALGFDVAMRPKAKNGERTEIEIIARYGGSKFG IIDAKNYAGKFPLSSSLVSHMASEYIPNYTGYEGKELTFFGYVTANDFSGERNLEKISDKAKRITGNPIS GFLVTARTLLGFLDYCIENDVPLEDRAELFVKAVKNKGYKSLEALLRELKETI AAY97906.1 Mva1269I 372 >gi|68480350|gb|AAY97906.1| Mva1269I restriction endonuclease [Kocuria varians] MYLNTAVFNIYGDNIVECSRAFHYILEGFKLANISITQEYDLQNITTPKFCIYTDKFRYIFIFIPGTSAS RWNKDIYKELVLNNGGPLKEGADAIITRIFSEDSELVLASMEFSAALPAGNNTWQRSGRAYSLTAANIPY FYIVQLGGKEIKKGKDGKSDKFATRLPNPALSLSFTLNTIKKPAPSLIVYDQAPEADSAISDLYSNCYGI DDFSLYLFKLITEENNLHELKNIYNKNVEFLQLRSVDEKGKNFSGKDYKYIFEHKDPYKGLTEVVKERKI PWKKKTATKTFENFPLRNQAPIFRLIDFLSTKSYGIVSKDSLPLTFIPSEHRVEVANYICNQLYIDKVSD EFVKWIYKKEDLAICIINGFKPGGDDSRPINGLPPFTKMLTNLDILTLMFGPAPPTQWDYLDSDPEKLNK TNGLWQSIFAFSDAILVDSSTRDNNKFVYNAYLKEHWVVQREKKESNTPISYFPKSVGEHDVDTSLHILF TYIGKHFESACNPPGGDWSGVSLLKNNIEYRWTSMYRVSQDGTKRPDHIYQLVYNSTDTLLLIESKGIKN DLLKSKEANVGIGMINYLKNLMARDYTAVKKDGEWKNIHGQMTLDKFLTFSAVAYLFTTINDNEYTSAAE LLVHSNTQLAFALEIKEKNSVMHIFTANTVAYNFAEYLLETMRNSHLPLKIYKPI ADR72996.1 BsrI 373 >gi|313667100|gb|ADR72996.1| BsrI [Geobacillus stearothermophilus] MRNIRIYSEVKEQGIFFKEVIQSVLEKANVEVVLVNSAMLDYSDVSVISLIRNQKKFDLLVSEVRDKREI PIVMVEFSTAVTTDDHELQRADAMFWAYKYKIPYLKISPMEKKSQTADDKFGGGRLLSVNDQIIHMYRTD GVMYHIEWESMDNSAYVKNAELYPSCPDCAPELASLFRCLLETIEKCENIEDYYRILLDKLGKQKVAVKW GNFREEKTLEQWKHEKFDLLERFSKSSSRMEYDKDKKELKIKVNRYGHAMDPERGILAFWKLVLGDEWKI VAEFQLQRKTLKGRQSYQSLFDEVSQEEKLMNIASEIIKNGNVISPDKAIEIHKLATSSTMISTIDLGTP ERKYITDDSLKGYLQHGLITNIYKNLLYYVDEIRFTDLQRKTIASLTWNKEIVNDYYKSLMDQLLDKNLR VLPLTSIKNISEDLITWSSKEILINLGYKILAASYPEAQGDRCILVGPTGKKTERKFIDLIAISPKSKGV ILLECKDKLSKSKDDCEKMNDLLNHNYDKVTKLINVLNINNYNYNNIIYTGVAGLIGRKNVDNLPVDFVI KFKYDAKNLKLNWEINSDILGKHSGSFSMEDVAVVRKRS AAL86024.1 BsmI 374 >gi|19347662|gb|AAL86024.1| BsmI [Geobacillus stearothermophilus] MNVFRIHGDNIIECERVIDLILSKINPQKVKRGFISLSCPFIEIIFKEGHDYFHWRFDMFPGFNKNTNDR WNSNILDLLSQKGSFLYETPDVIITSLNNGKEEILMAIEFCSALQAGNQAWQRSGRAYSVGRTGYPYIYI VDFVKYELNNSDRSRKNLRFPNPAIPYSYISHSKNTGNFIVQAYFRGEEYQPKYDKKLKFFDETIFAEDD IADYIIAKLQHRDTSNIEQLLINKNLKMVEFLSKNTKNDNNFTYSEWESIYNGTYRITNLPSLGRFKFRK KIAEKSLSGKVKEFNNIVQRYSVGLASSDLPFGVIRKESRNDFINDVCKLYNINDMKIIKELKEDADLIV CMLKGFKPRGDDNRPDRGALPLVAMLAGENAQIFTFIYGPLIKGAINLIDQDINKLAKRNGLWKSFVSLS DFIVLDCPIIGESYNEFRLIINKNNKESILRKTSKQQNILVDPTPNHYQENDVDTVIYSIFKYIVPNCFS GMCNPPGGDWSGLSIIRNGHEFRWLSLPRVSENGKRPDHVIQILDLFEKPLLLSIESKEKPNDLEPKIGV QLIKYIEYLFDFTPSVQRKIAGGNWEFGNKSLVPNDFILLSAGAFIDYDNLTENDYEKIFEVTGCDLLIA IKNQNNPQKWVIKFKPKNTIAEKLVNYIKLNFKSNIFDTGFFHIEG ADI24225.1 Nb.BtsCI 375 >gi|1297185870|gb|AD24225.1| BtsCI bottom-strand nicking enzyme variant [synthetic construct] MKRILYLLTEERPKINIIHQIINLEYKATLHFGAKIVPVMNEENKFTFIYHVKGIEVEGFDAVLIKIVSG HSSFVDYLVFDSNDLKPEKNTITLFDLDQYELDLSYYFGKGWIVRIPSPSDLPKYVVEETKTDDHESRNT NAYQRSSKFVFCELYYGKEVKKYMLYDISDGRTLSGTDTHNFGMRMLVTNNVNLVGVPNMYLPFTDIKEF INEKNRIADNGPSHNVPIRLKLDKEKNVIYISAKLDKGNGKNKNKISNDPNIGAVAIISATLRNLNWKGD IEIINHNLLPSSISSRSNGNKLLYIMKKLGVRFNNINVNWNNIKNNINYFFYNITSEKIVSIYYHLYVED KLSNARVIFDNHAGCGKSYFRTLNNKIIPVGKEIPLPALVIFDSDQNIVKVIAAAKAENVYNGVEQLSTF DKFIESYINKYYPGAAVECSVITWGKSSNPYVSFYLDKDGSAVFL ADI24224.1 Nt.BtsCI 376 >gi|297185868|gb|ADI24224.1| BtsCI top-strand nicking enzyme variant [synthetic construct] MKRILYLLTEERPKINIIHQIINLEYKATLHFGAKIVPVMNEENKFTFIYHVKGIEVEGFDAVLIKIVSG HSSFVDYLVFDSNDLKPEKNTITLFDLDQYELDLSYYFGKGWIVRIPSPSDLPKYVVFETKTDDHESRNT NAYQRSSKFVFCELYYGKEVKKYMLYDISDGRTLSGTDTHNFGMRMLVTNNVNLVGVPNMYLPFTDIKEF INEKNRIADNGPSHNVPIRLKLDKEKNVIYISAKLDKGNGKNKNKISNDPNIGAVAIISATLRNLNWKGD IEIINHNLLPSSISSRSNGNKLLYIMKKLGVRFNNINVNWNNIKNNINYFFYNITSEKIVSIYYHLYVED KLSNARVIFDNHAGCGKSYFRTLNNKIIPVGKEIPLPDLVIFDSDQNIVKVIEAEKAENVYNGVEQLSTF DKFIESYINKYYPGAAVECSVITWGKSSNPYVSFYLDKDGSAVFL >gi|85720924|gb|ABC75874.1| R1.BtsI [Geobacillus thermoglucosidasius] MKITEGIVHVAMRHFLKSNGWKLIAGQYPGGSDDELTALNIVDPVVARDNSPDPRRHSLGKIVPDLIAYK NDDLLVIEAKPKYSQDDRDKLLYLLSERKHDFYAALEKFATERNHPELLPVSKLNIIPGLAFSASENKFK KDPGFVYIRVSGIFEAFMEGYDWG ABC75874.1 R1.BtsI 377 >gi|85720924|gb|ABC75874.1| R1.BtsI [Geobacillus thermoglucosidasius] MKITEGIVHVAMRHFLKSNGWKLIAGQYPGGSDDELTALNIVDPVVARDNSPDPRRHSLGKIVPDLIAYK NDDLLVIEAKPKYSQDDRDKLLYLLSERKHDFYAALEKFATERNHPELLPVSKLNIIPGLAFSASENKFK KDPGFWIRVSGIFEAFMEGYDWG ABC75876.1 R2.BtsI 378 >gi|85720926|gb|ABC75876.1| R2.BtsI [Geobacillus thermoglucosidasius] MQIEQLMKSLTIYFDDIQEGLWFKNLHPLLESASLEAITGSLKRNPNLADVLKYDRPDIILTLNQTPILV IERTIEVPSGHNVGQRYGRLAAASEAGVPLVYFGPYAARKHGGATEGPRYMNLRLFYALDVMQKVNGSAI TTINWPVDQNFEILQDPSKDKRMKEYLEMFFDNLLKYGIAGINLAIRNSSFQAEQLAEREKFVETMITNP EQYDVPPDSVQILNAERFFNELGISENKRIICDEVVLYQVGMTYVRSDPYTGMALLYKYLYILGSERNRC LILKFPNITTDMWKKVAFGSRERKDVRIYRSVSDGILFADGYLSKEEL AAX14652.1 BbvCI subunit 379 >gi|60202520|gb|AAX14652.1| BbvCI endonuclease subunit 1 1 [Brevibacillus brevis] MINEDFFIYEQLSHKKNLEQKGKNAFDEETEELVRQAKSGYHAFIEGINYDEVTKLDLNSSVAALEDYIS IAKEIEKKHKMFNWRSDYAGSIIPEFLYRIVHVATVKAGLKPIFSTRNTIIEISGAAHREGLQIRRKNED FALGFHEVDVKIASESHRVISLAVACEVKTNIDKNKLNGLDFSAERMKRTYPGSAYFLITETLDFSPDEN HSSGLIDEIYVLRKQVRTKNRVQKAPLCPSVFAELLEDILEISYRASNVKGHVYDRLEGGKLIRV AAX14653.1 BbvCI subunit 380 >gi|60202521|gb|AAX14653.1| BbvCI endonuclease subunit 2 2 [Brevibacillus brevis] MFNQFNPLVYTHGGKLERKSKKDKTASKVFEEFGVMEAYNCWKEASLCIQQRDKDSVLKLVAALNTYKDA VEPIFDSRLNSAQEVLQPSILEEFFEYLFSRIDSIVGVNIPIRHPAKGYLSLSFNPHNIETLIQSPEYTV RAKDHDFIIGGSAKLTIQGHGGEGETTNIVVPAVAIECKRYLERNMLDECAGTAERLKRATPYCLYFVVA EYLKLDDGAPELTEIDEIYILRHQRNSERNKPGFKPNPIDGELIWDLYQEVMNHLGKIWWDPNSALQRGK VFNRP CAA74998.1 Bpu10I alpha 381 >gi|2894388|emb|CAA74998.1| Bpu10I restriction endonuclease alpha subunit subunit [Bacillus pumilus] MGVEQEWIKNITDMYQSPELIPSHASNLLHQLKREKRNEKLKKALEIITPNYISYISILLNNHNMTRKEI VILVDALNEYMNTLRHPSVKSVFSHQADFYSSVLPEFFNLLFRNLIKGLNEKIKVNSQKDIIIDCIFDPY NEGRVVFKKKRVDVAIILKNKFVFNNVEISDFAIPLVAIEIKTNLDKNMLSGIEQSVDSLKETFPLCLYY CITELADFAIEKQNYASTHIDEVFILRKQKRGPVRRGTPLEVVHADLILEVVEQVGEHLSKFKDPIKTLK ARMTEGYLIKGKGK CAA74999.1 Bpu10I beta 382 >gi|2894389|emb|CAA74999.1| Bpu10I restriction endonuclease beta subunit subunit [Bacillus pumilus] MTQIDLSNTKHGSILFEKQKNVKEKYLQQAYKHYLYFRRSIDGLEITNDEAIFKLTQAANNYRDNVLYLF ESRPNSGQEAFRYTILEEFFYHLFKDLVKKKFNQEPSSIVMGKANSYVSLSFSPESFLGLYENPIPYIHT KDQDFVLGCAVDLKISPKNELNKENETEIVVPVIAIECKTYIERNMLDSCAATASRLKAAMPYCLYIVAS EYMKMDQAYPELTDIDEVFILCKASVGERTALKKKGLPPHKLDENLMVELFHMVERHLNRVWWSPNEALS RGRVIGRP ABM69266.1 BmrI 383 >gi|123187377|gb|ABM69266.1| BmrI [Bacillus megaterium] MNYFSLHPNVYATGRPKGLINMLESVWISNQKPGDGTMYLISGFANYNGGIRFYETFTEHINHGGKVIAI LGGSTSQRLSSKQVVAELVSRGVDVYIINRKRLLHAKLYGSSSNSGESLVVSSGNFTGPGMSQNVEASLL LDNNTTSSMGFSWNGMVNSMLDQKWQIHNLSNSNPTSPSWNLLYDERTTNLTLDDTQKVTLILTLGHADT ARIQAAPKSKAGEGSQYFWLSKDSYDFFPPLTIRNKRGTKATYSCLINMNYLDIKYIDSECRVTFEAENN FDFRLGTGKLRYTNVAASDDIAAITRVGDSDYELRIIKKGSSNYDALDSAAVNFIGNRGKRYGYIPNDEF GRIIGAKF CAC12783.1 BfiI 384 >gi|10798463|emb|CAC12783.1| restriction endonuclease BfiI [Bacillus firmus] MNFFSLHPNVYATGRPKGLIGMLENVWVSNHTPGEGTLYLISGFSNYNGGVRFYETFTEHINQGGRVIAI LGGSTSQRLSSRQVVEELLNRGVEVHIINRKRILHAKLYGTSNNLGESLVVSSGNFTGPGMSQNIEASLL LDNNTTQSMGFSWNDMISEMLNQNWHIHNMTNATDASPGWNLLYDERTTNLTLDETERVTLIVTLGHADT ARIQAAPGTTAGQGTQYFWLSKDSYDFFPPLTIRNRRGTKATYSSLINMNYIDINYTDTQCRVTFEAENN FDFRLGTGKLRYTGVAKSNDIAAITRVGDSDYELRIIKQGTPEHSQLDPYAVSFIGNRGKRFGYISNEEF GRIIGVTF P05725.1 I-CreI 385 >gi|140470|sp|P05725.1|DNE1_CHLRE RecName: Full = DNA endonuclease I-CreI; AltName: Full = 23S rRNA intron protein MNTKYNKEFLLYLAGFVDGDGSIIAQIKPNQSYKFKHQLSLAFQVTQKTQRRWFLDKLVDEIGVGYVRDR GSVSDYILSEIKPLHNFLTQLQPFLKLKQKQANLVLKIIWRLPSAKESPDKFLEVCTWVDQIAALNDSKT RKTTSETVRAVLDSLSEKKKSSP Q9UQ84.2 hExoI 386 >gi|85700954|sp|Q9UQ84.21EXO1_HUMAN RecName: Full = Exonuclease 1; (EXO1_HUMAN) Short = hExo1; AltName: Full = Exonuclease I; Short = hExoI MGIQGLLQFIKEASEPIHVRKYKGQVVAVDTYCWLHKGAIACAEKLAKGEPTDRYVGFCMKFVNMLLSHG IKPILVFDGCTLPSKKEVERSRRERRQANLLKGKQLLREGKVSEARECFTRSINITHAMAHKVIKAARSQ GVDCLVAPYEADAQLAYLNKAGIVQAIITEDSDLLAFGCKKVILKMDQFGNGLEIDQARLGMCRQLGDVF TEEKFRYMCILSGCDYLSSLRGIGLAKACKVLRLANNPDIVKVIKKIGHYLKMNITVPEDYINGFIRANN TFLYQLVFDPIKRKLIPLNAYEDDVDPETLSYAGQYVDDSIALQIALGNKDINTFEQIDDYNPDTAMPAH SRSHSWDDKTCQKSANVSSIWHRNYSPRPESGTVSDAPQLKENPSTVGVERVISTKGLNLPRKSSIVKRP RSAELSEDDLLSQYSLSFTKKTKKNSSEGNKSLSFSEVFVPDLVNGPTNKKSVSTPPRTRNKFATFLQRK NEESGAVVVPGTRSRFFCSSDSTDCVSNKVSIQPLDETAVTDKENNLHESEYGDQEGKRLVDTDVARNSS DDIPNNHIPGDHIPDKATVFTDEESYSFESSKFTRTISPPTLGTLRSCFSWSGGLGDFSRTPSPSPSTAL QQFRRKSDSPTSLPENNMSDVSQLKSEESSDDESHPLREEACSSQSQESGEFSLQSSNASKLSQCSSKDS DSEESDCNIKLLDSQSDQTSKLRLSHFSKKDTPLRNKVPGLYKSSSADSLSTTKIKPLGPARASGLSKKP ASIQKRKHHNAENKPGLQIKLNELWKNFGFKKDSEKLPPCKKPLSPVRDNIQLTPEAEEDIFNKPECGRV QRAIFQ P39875.2 Yeast ExoI 387 >gi|1706421|sp|P39875.2|EXO1_YEAST RecName: Full = Exodeoxyribonuclease (EXO1_YEAST) 1; AltName: Full = Exodeoxyribonuclease I; Short = EXO I; Short = Exonuclease I; AltName: Full = Protein DHS1 MGIQGLLPQLKPIQNPVSLRRYEGEVLAIDGYAWLHRAACSCAYEIAMGKPTDKYLQFFIKRFSLLKTFK VEPYLVFDGDAIPVKKSTESKRRDKRKENKAIAERLWACGEKKNAMDYFQKCVDITPEMAKCIICYCKLN GIRYIVAPFEADSQMVYLEQKNIVQGIISEDSDLLVFGCRRLITKLNDYGECLEICRDNFIKLPKKFPLG SLTNEEIITMVCLSGCDYINGIPKVGLITAMKLVRRFNTIERIILSIQREGKLMIPDTYINEYEAAVLAF QFQRVFCPIRKKIVSLNEIPLYLKDTESKRKRLYACIGFVIHRETQKKQIVHFDDDIDHHLHLKIAQGDL NPYDFHQPLANREHKLQLASKSNIEFGKINTTNSEAKVKPIESFFQKMTKLDHNPKVANNIHSLRQAEDK LTMAIKRRKLSNANVVQETLKDIRSKFFNKPSMTVVENFKEKGDSIQDFKEDTNSQSLEEPVSESQLSTQ IPSSFITTNLEDDDNLSEEVSEVVSDIEEDRKNSEGKTIGNEIYNTDDDGDGDTSEDYSETAESRVPTSS TTSFPGSSQRSISGCTKVLQKFRYSSSFSGVNANRQPLFPRHVNQKSRGMVYVNQNRDDDCDDNDGKNQI TQRPSLRKSLIGARSQRIVIDMKSVDERKSFNSSPILHEESKKRDIETTKSSQARPAVRSISLLSQFVYK GK BAJ43803.1 E.coli ExoI 388 >gi|315136644|dbj|BA.143803.1| exonuclease I [Escherichia coli DH1] MMNDGKQQSTFLFHDYETFGTHPALDRPAQFAAIRTDSEFNVIGEPEVFYCKPADDYLPQPGAVLITGIT PQEARAKGENEAAFAARIHSLFTVPKTCILGYNNVRFDDEVTRNIFYRNFYDPYAWSWQHDNSRWDLLDV MRACYALRPEGINWPENDDGLPSFRLEHLTKANGIEHSNAHDAMADVYATIAMAKLVKTRQPRLFDYLFT HRNKHKLMALIDVPQMKPLVHVSGMFGAWRGNTSWVAPLAWHPENRNAVIMVDLAGDISPLLELDSDTLR ERLYTAKTDLGDNAAVPVKLVHINKCPVLAQANTLRPEDADRLGINRQHCLDNLKILRENPQVREKVVAI FAEAEPFTPSDNVDAQLYNGFFSDADRAAMKIVLETEPRNLPALDITFVDKRIEKLLFNYRARNFPGTLD YAEQQRWLEHRRQVFTPEFLQGYADELQMLVQQYADDKEKVALLKALWQYAEEIV Q9BQ50.1 Human TREX2 389 >gi|47606206|sp|Q98Q50.1|TREX2_HUMAN RecName: Full = Three prime repair exonuclease 2; AltName: Full = 3′-5′ exonuclease TREX2 MGRAGSPLPRSSWPRMDDCGSRSRCSPTLCSSLRTCYPRGNITMSEAPRAETFVFLDLEATGLPSVEPEI AELSLFAVHRSSLENPEHDESGALVLPRVLDKLTLCMCPERPFTAKASEITGLSSEGLARCRKAGFDGAV VRTLQAFLSRQAGPICLVAHNGFDYDFPLLCAELRRLGARLPRDTVCLDTLPALRGLDRAHSHGTRARGR QGYSLGSLFHRYFRAEPSAAHSAEGDVHTLLLIFLHRAAELLAWADEQARGWAHIEPMYLPPDDPSLEA Q91XB0.2 Mouse TREX1 390 >gi|47606196|sp|Q91XB0.2|TREX1_MOUSE RecName: Full = Three prime repair exonuclease 1; AltName: Full = 3′-5′ exonuclease TREX1 MGSQTLPHGHMQTLIFLDLEATGLPSSRPEVTELCLLAVHRRALENTSISQGHPPPVPRPPRVVDKLSLC IAPGKACSPGASEITGLSKAELEVQGRQRFDDNLAILLRAFLQRQPQPCCLVAHNGDRYDFPLLQTELAR LSTPSPLDGTFCVDSIAALKALEQASSPSGNGSRKSYSLGSIYTRLYWQAPTDSHTAEGDVLTLLSICQW KPQALLQWVDEHARPFSTVKPMYGTPATTGTTNLRPHAATATTPLATANGSPSNGRSRRPKSPPPEKVPE APSQEGLLAPLSLLTLLTLAIATLYGLFLASPGQ Q9NSU2.1 Human TREX1 391 >gi|47606216|sp|Q9NSU2.1|TREX1_HUMAN RecName: Full = Three prime repair exonuclease 1; AltName: Full = 3′-5′ exonuclease TREX1; AltName: Full = DNase III MGPGARRQGRIVQGRPEMCFCPPPTPLPPLRILTLGTHTPTPCSSPGSAAGTYPTMGSQALPPGPMQTLI FFDMEATGLPFSQPKVTELCLLAVHRCALESPPTSQGPPPTVPPPPRVVDKLSLCVAPGKACSPAASEIT GLSTAVLAAHGRQCFDDNIANLLLAFLRRQPQPWCLVAHNGDRYDFPLLQAELAMLGLTSALDGAFCVDS ITALKALERASSPSEHGPRKSYSLGSIYTRLYGQSPPDSHTAEGDVLALLSICQWRPQALLRWVDAHARP FGTIRPMYGVTASARTKPRPSAVTTTAHLATTRNTSPSLGESRGTKDLPPVKDPGALSREGLLAPLGLLA ILTLAVATLYGLSLATPGE Q9BG99.1 Bovine TREX1 392 >gi|47606205|sp|Q9BG99.1|TREX1_BOVIN RecName: Full = Three prime repair exonuclease 1; AltName: Ful1 = 3′-5′ exonuclease TREX1 MGSRALPPGPVQTLIFLDLEATGLPFSQPKITELCLLAVHRYALEGLSAPQGPSPTAPVPPRVLDKLSLC VAPGKVCSPAASEITGLSTAVLAAHGRRAFDADLVNLIRTFLQRQPQPWCLVAHNGDRYDFPLLRAELAL LGLASALDDAFCVDSIAALKALEPTGSSSEHGPRKSYSLGSVYTRLYGQAPPDSHTAEGDVLALLSVCQW RPRALLRWVDAHAKPFSTVKPMYVITTSTGTNPRPSAVTATVPLARASDTGPNLRGDRSPKPAPSPKMCP GAPPGEGLLAPLGLLAFLTLAVAMLYGLSLAMPGQ AAH91242.1 Rat TREX1 393 >gi|60688197|gb|AAH91242.1| Trex1 protein [Rattus norvegicus] MGSQALPHGHMQTLIFLDLEATGLPYSQPKITELCLLAVHRHALENSSMSEGQPPPVPKPPRVVDKLSLC IAPGKPCSSGASEITGLTTAGLEAHGRQRFNDNLATLLQVFLQRQPQPCCLVAHNGDRYDFPLLQAELAS LSVISPLDGTFCVDSIAALKTLEQASSPSEHGPRKSYSLGSIYTRLYGQAPTDSHTAEGDVLALLSICQW KPQALLQWVDKHARPFSTIKPMYGMAATTGTASPRLCAATTSSPLATANLSPSNGRSRGKRPTSPPPENV PEAPSREGLLAPLGLLTFLTLAIAVLYGIFLASPGQ AAH63664.1 Human DNA2 394 >gi|39793966|gb|AAH63664.1| DNA2 protein [Homo sapiens] FAIPASRMEQLNELELLMEKSFWEEAELPAELFQKKVVASFPRTVLSTGMDNRYLVLAVNTVQNKEGNCE KRLVITASQSLENKELCILRNDWCSVPVEPGDIIHLEGDCTSDTWIIDKDFGYLILYPDMLISGTSIASS IRCMRRAVLSETFRSSDPATRQMLIGTVLHEVFQKAINNSFAPEKLQELAFQTIQEIRHLKEMYRLNLSQ DEIKQEVEDYLPSFCKWAGDFMHKNTSTDFPQMQLSLPSDNSKDNSTCNIEVVKPMDIEESIWSPRFGLK GKIDVTVGVKIHRGYKTKYKIMPLELKTGKESNSIEHRSQVVLYTLLSQERRADPEAGLLLYLKTGQMYP VPANHLDKRELLKLRNQMAFSLFHRISKSATRQKTQLASLPQIIEEEKTCKYCSQIGNCALYSRAVEQQM DCSSVPIVMLPKIEEETQHLKQTHLEYFSLWCLMLTLESQSKDNKKNHQNIWLMPASEMEKSGSCIGNLI RMEHVKIVCDGQYLHNFQCKHGAIPVTNLMAGDRVIVSGEERSLFALSRGYVKEINMTTVTCLLDRNLSV LPESTLFRLDQEEKNCDIDTPLGNLSKLMENTFVSKKLRDLIIDFREPQFISYLSSVLPHDAKDTVACIL KGLNKPQRQAMKKVLLSKDYTLIVGMPGIGKTTTICTLVPAPEQVEKGGVSNVTEAKLIVFLTSIFVKAG CSPSDIGIIAPYRQQLKIINDLLARSIGMVEVNTVDKYQGRDKSIVLVSFVRSNKDGTVGELLKDWRRLN VAITRAKHKLILLGCVPSLNCYPPLEKLLNHLNSEKLIIDLPSREHESLCHILGDFQRE P38859.1 Yeast DNA2 395 >gi|731738|sp|P38859.1|DNA2_YEAST RecName: Full = DNA replication (DNA2_YEAST) ATP-dependent helicase DNA2 MPGTPQKNKRSASISVSPAKKTEEKEIIQNDSKAILSKQTKRKKKYAFAPINNLNGKNTKVSNASVLKSI AVSQVRNTSRTKDINKAVSKSVKQLPNSQVKPKREMSNLSRHHDFTQDEDGPMEEVIWKYSPLQRDMSDK TTSAAEYSDDYEDVQNPSSTPIVPNRLKTVLSFTNIQVPNADVNQLIQENGNEQVRPKPAEISTRESLRN IDDILDDIEGDLTIKPTITKFSDLPSSPIKAPNVEKKAEVNAEEVDKMDSTGDSNDGDDSLIDILTQKYV EKRKSESQITIQGNTNQKSGAQESCGKNDNTKSRGEIEDHENVDNQAKTGNAFYENEEDSNCQRIKKNEK IEYNSSDEFSDDSLIELLNETQTQVEPNTIEQDLDKVEKMVSDDLRIATDSTLSAYALRAKSGAPRDGVV RLVIVSLRSVELPKIGTQKILECIDGKGEQSSVVVRHPWVYLEFEVGDVIHIIEGKNIENKRLLSDDKNP KTQLANDNLLVLNPDVLFSATSVGSSVGCLRRSILQMQFQDPRGEPSLVMTLGNIVHELLQDSIKYKLSH NKISMEIIIQKLDSLLETYSFSIIICNEEIQYVKELVMKEHAENILYFVNKFVSKSNYGCYTSISGTRRT QPISISNVIDIEENIWSPIYGLKGFLDATVEANVENNKKHIVPLEVKTGKSRSVSYEVQGLIYTLLLNDR YEIPIEFFLLYFTRDKNMTKFPSVLHSIKHILMSRNRMSMNFKHQLQEVFGQAQSRFELPPLLRDSSCDS CFIKESCMVLNKLLEDGTPEESGLVEGEFEILTNHLSQNLANYKEFFTKYNDLITKEESSITCVNKELFL LDGSTRESRSGRCLSGLVVSEVVEHEKTEGAYIYCFSRRRNDNNSQSMLSSQIAANDFVIISDEEGHFCL CQGRVQFINPAKIGISVKRKLLNNRLLDKEKGVTTIQSVVESELEQSSLIATQNLVTYRIDKNDIQQSLS LARFNLLSLFLPAVSPGVDIVDERSKLCRKTKRSDGGNEILRSLLVDNRAPKFRDANDDPVIPYKLSKDT TLNLNQKEAIDKVMRAEDYALILGMPGTGKTIVIAEIIKILVSEGKRVLLTSYTHSAVDNILIKLRNTNI SIMRLGMKHKVHPDTQKYVPNYASVKSYNDYLSKINSTSVVATTCLGINDILFTLNEKDFDYVILDEASQ ISMPVALGPLRYGNRFIMVGDHYQLPPLVKNDAARLGGLEESLFKTFCEKHPESVAELTLQYRMCGDIVT LSNFLIYDNKLKCGNNEVFAQSLELPMPEALSRYRNESANSKQWLEDILEPTRKVVFLNYDNCPDIIEQS EKDNITNHGEAELTLQCVEGMLLSGVPCEDIGVMTLYRAQLRLLKKIFNKNVYDGLEILTADQFQGRDKK CIIISMVRRNSQLNGGALLKELRRVNVAMTRAKSKLIIIGSKSTIGSVPEIKSFVNLLEERNWVYTMCKD ALYKYKFPDRSNAIDEARKGCGKRTGAKPITSKSKFVSDKPIIKEILQEYES AAA45863.1 VP16 396 >gi|330318|gb|AAA45863.1| VP16 [Human herpesvirus 2] MDLLVDDLFADRDGVSPPPPRPAGGPKNTPAAPPLYATGRLSQAQLMPSPPMPVPPAALFNRLLDDLGFS AGPALCTMLDTWNEDLFSGFPTNADMYRECKFLSTLPSDVIDWGDAHVPERSPIDIRAHGDVAFPTLPAT RDELPSYYEAMAQFFRGELRAREESYRTVLANFCSALYRYLRASVRQLHRQAHMRGRNRDLREMLRITIA DRYYRETARLARVLFLHLYLFLSREILWAAYAEQMMRPDLFDGLCCDLESWRQLACLFQPLMFINGSLTV RGVPVEARRLRELNHIREHLNLPLVRSAAAEEPGAPLTTPPVLQGNQARSSGYFMLLIRAKLDSYSSVAT SEGESVMREHAYSRGRTRNNYGSTIEGLLDLPDDDDAPAEAGLVAPRMSFLSAGQRPRRLSTTAPITDVS LGDELRLDGEEVDMTPADALDDFDLEMLGDVESPSPGMTHDPVSYGALDVDDFEFEQMFTDAMGIDDFGG Gene ID: 5932 RBBP8 397 MNISGSSCGSPNSADTSSDFKDLWTKLKECHDREVQGLQVKVTKLKQERILDAQRLEEFF UniProtKB/ retinoblastoma TKNQQLREQQKVLHETIKVLEDRLRAGLCDRCAVTEEHMRKKQQEFENIRQQNLKLITEL Swiss-Prot: binding MNERNTLQEENKKLSEQLQQKIENDQQHQAAELECEEDVIPDSPITAFSFSGVNALRRKE Q99708 protein 8 NPHVRYIEQTHTKLEHSVCANEMRKVSKSSTHPQHNPNENEILVADTYDQSQSPMAKAHG TSSYTPDKSSFNLATVVAETLGLGVQEESETQGPMSPLGDELYHCLEGNHKKQPFEESTR NTEDSLRFSDSTSKTPPQEELPTRVSSPVFGATSSIKSGLDLNTSLSPSLLQPGKKKHLK TLPFSNTCISRLEKTRSKSEDSALFTHHSLGSEVNKIIIQSSNKQILINKNISESLGEQN RTEYGKDSNTDKHLEPLKSLGGRTSKRKKTEEESEHEVSCPQASFDKENAFPFPMDNQFS MNGDCVMDKPLDLSDRFSAIQRQEKSQGSETSKNKFRQVTLYEALKTIPKGFSSSRKASD GNCTLPKDSPGEPCSQECIILQPLNKCSPDNKPSLQIKEENAVFKIPLRPRESLETENVL DDIKSAGSHEPIKIQTRSDHGGCELASVLQLNPCRTGKIKSLQNNQDVSFENIQWSIDPG ADLSQYKMDVTVIDTKDGSQSKLGGETVDMDCTLVSETVLLKMKKQEQKGEKSSNEERKM NDSLEDMFDRTTHEEYESCLADSFSQAADEEEELSTATKKLHTHGDKQDKVKQKAFVEPY FKGDERETSLQNFPHIEVVRKKEERRKLLGHTCKECEIYYADMPAEEREKKLASCSRHRF RYIPPNTPENFWEVGFPSTQTCMERGYIKEDLDPCPRPKRRQPYNAIFSPKGKEQKT ACM07430.1 Colicin E9 598 >gi|221185856|gb|ACM07430.1| colicin E9 [Escherichia coli] MSGGDGRGHNTGAHSTSGNINGGPTGIGVSGGASDGSGWSSENNPWGGGSGSGIHWGGGSGRGNGGG NGN SGGGSGTGGNLSAVAAPVAFGFPALSTPGAGGLAVSISASELSAAIAGIIAKLKKVNLKFTPFGVVL SSL IPSEIAKDDPNMMSKIVTSLPADDITESPVSSLPLDKATVNVNVRVVDDVKDERQNISVVSGVPMSV PVV DAKPTERPGVFTASIPGAPVLNISVNDSTPAVQTLSPGVTNNTDKDVRPAGFTQGGNTRDAVIRFPK DSG HNAVYVSVSDVLSPDQVKQRQDEENRRQQEWDATHPVEAAERNYERARAELNQANEDVARNQERQAK AVQ VYNSRKSELDAANKTLADAIAEIKQFNRFAHDPMAGGHRMWQMAGLKAQRAQTDVNNKQAAFDAAAK EKS DADAALSAAQERRKQKENKEKDAKDKLDKESKRNKPGKATGKGKPVGDKWLDDAGKDSGAPIPDRIA DKL RDKEFKSFDDFRKAVWEEVSKDPELSKNLNPSNKSSVSKGYSPFTPKNQQVGGRKVYELHHDKPISQ GGE VYDMDNIRVTTPKRHIDIHRGK NP_775816.1 APFL 599 >gi|135233|sp|P14870.1|T2F1_PLAOK RecName: Full = Type-2 restriction enzyme FokI; Short = R.FokI; AltName: Full = Endonuclease FokI; AltName: Full = Type II restriction enzyme FokI; AltName: Full = Type IIS restriction enzyme FokI MFLSMVSKIRTFGWVQNPGKFENLKRVVQVFDRNSKVHNEVKNIKIPTLVKESKIQKELVAIMNQHD LIY TYKELVGTGTSIRSEAPCDAIIQATIADQGNKKGYIDNWSSDGFLRWAHALGFIEYINKSDSFVITD VGL AYSKSADGSAIEKEILIEAISSYPPAIRILTLLEDGQHLTKFDLGKNLGFSGESGFTSLPEGILLDT LAN AMPKDKGEIRNNWEGSSDKYARMIGGWLDKLGLVKQGKKEFIIPTLGKPDNKEFISHAFKITGEGLK VLR RAKGSTKFTRVPKRVYWEMLATNLTDKEYVRTRRALILEILIKAGSLKIEQIQDNLKKLGFDEVIET IEN DIKGLINTGIFIEIKGRFYQLKDHILQFVIPNRGVTKQLVKSELEEKKSELRHKLKYVPHEYIELIE IAR NSTQDRILEMKVMEFFMKVYGYRGKHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQAD EMQ RYVEENQTRNKHINPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAQLTRLNHITNCNGAVLSVEELLI GGE MIKAGTLTLEEVRRKFNNGEINF P14870.1 FokI 600 >gi|221185857|gb|ACM07431.1| colicin E9 immunity protein [Escherichia coli] MELKHSISDYTEAEFLQLVTTICNADTSSEEELVKLVTHFEEMTEHPSGSDLIYYPKEGDDDSPSGI VNT VKQWRAANGKSGFKQG

In this last case, the chimeric protein according to the invention can comprise from its N-terminus toward its C-terminus: A first set of RVDs, a first linker, a protein domain, a second linker, a second set of RVDs. In another example for this case, the chimeric protein according to the invention can comprise a N-terminus domain as previously mentioned, a first set of RVDs, a first linker, a protein domain, a second linker, a second set of RVDs and a C-terminus domain as previously mentioned (as illustrated on FIG. 9B). As exemplified above, linkers used in this case of chimeric protein according to the present invention can be active linkers comprising active domains which allow a change of their structure under appropriate stimulation.

In the scope of the present invention, the chimeric protein comprises a core scaffold with any combination between an additional C-terminus domain according to the invention and an additional N-terminus domain according to the invention.

In another embodiment, said protein domain within the chimeric protein according to the present invention is a first protein subdomain interacting with a second protein subdomain to form said catalytic entity able to process said nucleic acid target sequence. In a preferred embodiment, said first protein subdomain is selected from some of the group listed in Table 2 (SEQ ID NO: 339 to 397), such as MmeI (SEQ ID NO: 339), R.PleI (SEQ ID NO: 369), MlyI (SEQ ID NO: 370) as non-limiting examples, a functional mutant, a variant or derivatives of these protein subdomains thereof. In another preferred embodiment, said second protein subdomain is selected from some of the group listed in Table 2 (SEQ ID NO: 339 to 397 and SEQ ID NO: 598-599), such as MmeI (SEQ ID NO: 339), R.PleI (SEQ ID NO: 369), MlyI (SEQ ID NO: 370) as non-limiting examples, a functional mutant, a variant or derivatives of these protein subdomains thereof.

In another embodiment, said protein domain within the chimeric protein according to the present invention is a first protein subdomain interacting with a second protein subdomain to form a protein entity catalytically active. Said first protein subdomain can be used to measure, quantify or provoke protein-protein interactions at said nucleic target sequence according to the present invention. Said first protein subdomain can be a protein module or protein subdomain known to mediate protein-protein interaction in cell signaling. Said first protein subdomain can be used for diagnosis, analytical or therapeutic applications. Said protein entity can be a reporter protein such as a fluorescent protein, luciferase, β-galactosidase as non-limiting examples. In this case, a first part of the reporter protein can be fused to said first protein subdomain according to the present invention and a second part of the reporter protein can be fused to said second protein subdomain, wherein said reporter protein is only active when said first and second protein subdomains according to the present invention interact. In another embodiment, said first protein subdomain or the protein entity resulting from the interaction between first and second subdomains can be used as intracellular sensor for calcium level, pH, redox environment as non-limiting examples. In another embodiment, said protein domain or subdomains are usable for applications such as Fluorescence Resonance Energy Transfer (FRET) as a non-limiting example. In another embodiment, said protein domain or subdomains are coupled to a dye.

In another embodiment, said protein domain within the chimeric protein according to the present invention can be an inactive subdomain and can react with more than one protein domain or subdomain to form an active protein entity, i.e. said active entity can be comprising two, three, four or several subdomains and being an enzyme or a fluorescent protein as non-limiting example. In another embodiment, said active entity formed with the chimeric protein according to the present invention can interact or react with another protein or protein domain having a different activity. In another embodiment, said active entity formed with the chimeric protein according to the present invention can be associated within, or located 5′ or located 3′ regarding the nucleic acid target sequence location with another protein or protein domain having a different activity in order to process said nucleic acid target sequence; as a non-limiting example, said chimeric protein according to the present invention can comprise a protein domain with a cleavase activity for its nucleic acid target sequence and can be associated with an exonuclease activity to increase the mutagenesis rate at its nucleic acid target sequence location.

In another embodiment, said second protein subdomain interacting with said first protein subdomain to form said catalytic entity able to process said nucleic acid target sequence according to the present invention is fused to a core scaffold comprising a set of Repeat Variable Dipeptide regions (RVDs) able to bind a second unique nucleic sequence adjacent to said nucleic acid target sequence wherein each RVD comprises a pair of amino acids responsible for recognizing one nucleotide selected from the group consisting of HD for recognizing C, NG for recognizing T, NI for recognizing A, NN for recognizing G or A, NS for recognizing A, C, G or T, HG for recognizing T, IG for recognizing T, NK for recognizing G, HA for recognizing C, ND for recognizing C, HI for recognizing C, HN for recognizing G, NA for recognizing G, SN for recognizing G or A and YG for recognizing T, TL for recognizing A, VT for recognizing A or G and SW for recognizing A. More preferably, RVDs associated with recognition of the nucleotides C, T, A, G/A and G respectively are selected from the group consisting of NN or NK for recognizing G, HD for recognizing C, NG for recognizing T and NI for recognizing A, TL for recognizing A, VT for recognizing A or G and SW for recognizing A. In another embodiment, RVDS associated with recognition of the nucleotide C are selected from the group consisting of N* and RVDS associated with recognition of the nucleotide T are selected from the group consisting of N* and H*, where * denotes a gap in the repeat sequence that corresponds to a lack of amino acid residue at the second position of the RVD. In another embodiment, critical amino acids 12 and 13 can be mutated towards other amino acid residues in order to modulate their specificity towards nucleotides A, T, C and G and in particular to enhance this specificity. By other amino acid residues is intended any of the twenty natural amino acid residues or unnatural amino acids derivatives.

In another embodiment, said core scaffold fused to said second protein subdomain can have the same scope of characteristics that those previously listed to describe the chimeric protein according to the present invention, regarding the origin of said core scaffold, the number of RVDs comprised in said core scaffold, the nature of those RVDs (natural, artificial or RVDs-like domains), the existence of additional N-terminus or C-terminus or both domains on this core scaffold, the existence of one or several localization signals on this core scaffold, the existence of one or several peptidic linkers on this core scaffold to fuse one or several protein domains on this core scaffold.

In another aspect of the invention, said chimeric protein according to the present invention can function as a dimer wherein a first and a second monomer are derived from a Transcription Activator-Like Effector (TALE). In another embodiment, said chimeric protein according to the present invention can function as a dimer wherein said first monomer comprises:

-   -   (i) A core scaffold comprising a set of Repeat Variable         Dipeptide regions (RVDs) able to bind a first nucleic acid         sequence adjacent to a nucleic acid target sequence to process         wherein each RVD comprises a pair of amino acids responsible for         recognizing one nucleotide selected from the group consisting of         HD for recognizing C, NG for recognizing T, NI for recognizing         A, NN for recognizing G or A, NS for recognizing A, C, G or T,         HG for recognizing T, IG for recognizing T, NK for recognizing         G, HA for recognizing C, ND for recognizing C, HI for         recognizing C, HN for recognizing G, NA for recognizing G, SN         for recognizing G or A and YG for recognizing T, TL for         recognizing A, VT for recognizing A or G and SW for recognizing         A.     -   (ii) A protein domain part of a catalytic entity able to process         said nucleic acid target sequence;         and wherein said second monomer comprises:     -   (i) A core scaffold comprising a set of Repeat Variable         Dipeptide regions (RVDs) able to bind a second nucleic acid         sequence adjacent to a nucleic acid target sequence to process         wherein each RVD comprises a pair of amino acids responsible for         recognizing one nucleotide selected from the group consisting of         HD for recognizing C, NG for recognizing T, NI for recognizing         A, NN for recognizing G or A, NS for recognizing A, C, G or T,         HG for recognizing T, IG for recognizing T, NK for recognizing         G, HA for recognizing C, ND for recognizing C, HI for         recognizing C, HN for recognizing G, NA for recognizing G, SN         for recognizing G or A and YG for recognizing T, TL for         recognizing A, VT for recognizing A or G and SW for recognizing         A.     -   (ii) A protein domain part of a catalytic entity able to process         said nucleic acid target sequence;         thereby obtaining a chimeric protein or a chimeric dimer protein         that is a comprising a catalytic entity able to process said         nucleic acid target sequence.

More preferably, in said first and second monomers RVDs associated with recognition of the nucleotides C, T, A, G/A and G respectively are selected from the group consisting of NN or NK for recognizing G, HD for recognizing C, NG for recognizing T and NI for recognizing A, TL for recognizing A, VT for recognizing A or G and SW for recognizing A. In another embodiment, in said first and second monomers RVDS associated with recognition of the nucleotide C are selected from the group consisting of N* and RVDS associated with recognition of the nucleotide T are selected from the group consisting of N* and H*, where * denotes a gap in the repeat sequence that corresponds to a lack of amino acid residue at the second position of the RVD. In another embodiment, critical amino acids 12 and 13 in said first and second monomers can be mutated towards other amino acid residues in order to modulate their specificity towards nucleotides A, T, C and G and in particular to enhance this specificity. By other amino acid residues is intended any of the twenty natural amino acid residues or unnatural amino acids derivatives.

In another embodiment, said core scaffolds of the first and second monomers can have the same scope of characteristics that those previously listed to describe the chimeric protein according to the present invention, regarding the origin of said core scaffolds, the number of RVDs comprised in said core scaffolds, the nature of those RVDs (natural, artificial or RVDs-like domains), the existence of additional N-terminus or C-terminus or both domains on these core scaffolds, the existence of one or several localization signals on these core scaffolds, the existence of one or several peptidic linkers on these core scaffolds to fuse one or several protein domains on these core scaffolds.

In a preferred embodiment, at least one monomer is selected from the group consisting of SEQ ID NO: 19 to 133, SEQ ID NO: 180-182, and SEQ ID NO: 186-188, a functional mutant, a variant or a derivative thereof. In a preferred embodiment, said first and second monomers are selected from the group consisting of SEQ ID NO: 19 to 133, SEQ ID NO: 180-182, and SEQ ID NO: 186-188, functional mutants, variants or derivatives thereof.

In another embodiment, said first and second monomers are fused by a peptidic linker forming a single polypeptide chain for simple and efficient vectorization. In another embodiment, said peptidic linker contains one or several active domains which allow its deployment under stimulation, as previously mentioned.

In another embodiment, said first and second monomers have the same amino acid sequences and recognize the same nucleic acid sequence adjacent to said nucleic target sequence. In another embodiment, said first and second monomers have different amino acid sequences and recognize the same nucleic acid sequence adjacent to said nucleic target sequence, i.e first and second monomers are isoschizomers. In another embodiment, said first and second monomers have the same amino acid sequences and recognize different nucleic acid sequences adjacent to said nucleic target sequence because of TAL code degeneracy. In another embodiment, said first and second monomers have different amino acid sequences and recognize different nucleic acid sequences adjacent to said nucleic target sequence.

In another embodiment, said chimeric protein according to the present invention binds a first and a second nucleic acid sequences which are on the same nucleic acid strand adjacent of said nucleic acid target sequence. In another embodiment, said chimeric protein according to the present invention binds a first and a second nucleic acid sequences which are adjacent to said nucleic acid target sequence but not on the same nucleic acid strand. In another embodiment, said chimeric protein according to the present invention binds a first and a second nucleic acid sequences which are located 5′ of said nucleic acid target sequence. In another embodiment, said chimeric protein according to the present invention binds a first and a second nucleic acid sequences which are located 3′ of said nucleic acid target sequence. In another embodiment, said chimeric protein according to the present invention binds a first nucleic sequence which is 5′ located of said nucleic acid sequence target and a second nucleic acid sequence which is 3′ located of said nucleic acid sequence target.

In another embodiment, said chimeric protein according to the present invention binds a first and a second nucleic acid sequences which are adjacent to said nucleic acid target sequence and separated by a nucleic acid sequence (i.e. the spacer) of 5-40 base pairs (bp), i.e. the spacer length. In another embodiment, said chimeric protein according to the present invention binds a first and a second nucleic acid sequences which are adjacent to said nucleic acid target sequence and separated by a spacer of 8 bp length.

Some structures of chimeric dimer proteins according to the invention are given on FIG. 9.

In another embodiment, said chimeric dimer protein according to the present invention can be associated with a third chimeric protein comprising:

-   -   (i) A core scaffold comprising a set of Repeat Variable         Dipeptide regions (RVDs) able to bind a nucleic acid sequence         adjacent to a nucleic acid target sequence to process wherein         each RVD comprises a pair of amino acids responsible for         recognizing one nucleotide selected from the group consisting of         HD for recognizing C, NG for recognizing T, NI for recognizing         A, NN for recognizing G or A, NS for recognizing A, C, G or T,         HG for recognizing T, IG for recognizing T, NK for recognizing         G, HA for recognizing C, ND for recognizing C, HI for         recognizing C, HN for recognizing G, NA for recognizing G, SN         for recognizing G or A and YG for recognizing T, TL for         recognizing A, VT for recognizing A or G and SW for recognizing         A.     -   (ii) protein catalytic domain.

Said third chimeric protein can have the same scope of characteristics that those previously listed regarding the origin of said core scaffold, the number of RVDs comprised in said core scaffold, the nature of those RVDs (natural, artificial or RVDs-like domains), the existence of additional N-terminus or C-terminus or both domains on this core scaffold, the existence of one or several localization signals on this core scaffold, the existence of one or several peptidic linkers on this core scaffold to fuse one or several protein domains on this core scaffold and the nature and characteristics of its protein(s) domain(s) or subdomain(s).

In another embodiment, said third chimeric protein binds a nucleic acid sequence within the nucleic acid target sequence to process by said first and second monomers previously described. In other words, said third chimeric protein can have a binding sequence within the spacer separating the first and a second nucleic acid sequences recognized and bound by the first and second monomers of the chimeric dimer protein according to the invention. In another embodiment, said third chimeric protein comprises a protein domain with a catalytic activity to process nucleic acid target sequence that is different from that of the catalytically active entity formed by the protein subdomains of respective first and second monomers of the chimeric dimer protein according to the present invention. As non-limiting example, first and second monomer protein subdomains can form a catalytically active entity with a cleavase activity towards the nucleic acid target sequence and said third chimeric protein can comprise a protein domain with an exonuclease activity to increase the mutagenesis rate a the nucleic acid target sequence location. In another embodiment, said third chimeric protein binds a nucleic acid sequence located 5′ regarding the nucleic acid target sequence to process by said first and second monomers previously described. In another embodiment, said third chimeric protein binds a nucleic acid sequence located 3′ regarding the nucleic acid target sequence to process by said first and second monomers previously described.

In another embodiment said chimeric dimer protein according to the present invention can be associated with a core scaffold comprising a set of Repeat Variable Dipeptide regions (RVDs) able to bind a nucleic acid sequence adjacent to a nucleic acid target sequence to process wherein each RVD comprises a pair of amino acids responsible for recognizing one nucleotide selected from the group consisting of HD for recognizing C, NG for recognizing T, NI for recognizing A, NN for recognizing G or A, NS for recognizing A, C, G or T, HG for recognizing T, IG for recognizing T, NK for recognizing G, HA for recognizing C, ND for recognizing C, HI for recognizing C, HN for recognizing G, NA for recognizing G, SN for recognizing G or A, YG for recognizing T, TL for recognizing A, VT for recognizing A or G, SW for recognizing A, N* for recognizing C or T and H* for recognizing T (where * denotes a gap in the repeat sequence that corresponds to a lack of amino acid residue at the second position of the RVD). In other words, said core scaffold associated with said chimeric dimer protein according to the present invention can have a binding sequence within the spacer separating the first and a second nucleic acid sequences recognized and bound by the first and second monomers of the chimeric dimer protein according to the invention. In another embodiment, said core scaffold associated with said chimeric dimer protein according to the present invention allows to control the processing activity of said chimeric dimer protein according to the present invention on its nucleic acid target sequence. In other words, said core scaffold associated with said chimeric dimer protein according to the present invention allows to block the access of said chimeric dimer protein according to the present invention on its nucleic acid target sequence. In another embodiment, the expression of said core scaffold to control the processing activity of said chimeric dimer protein can be a cell-cycle or tissue dependent expression, allowing a cell-cycle or tissue dependent control of said chimeric dimer protein activity towards its nucleic acid target sequence. Such a blocking core scaffold can also be used in combination with a chimeric protein according to the invention wherein said core scaffold binds the nucleic acid target sequence of said chimeric protein according to the invention to allow a control, a cell-cycle or time dependent control of said chimeric protein activity towards its nucleic acid sequence.

In another embodiment, said chimeric protein according to the present invention can function as a trimer wherein a third monomer is derived from a Transcription Activator-Like Effector (TALE) and comprises:

-   -   (i) A core scaffold comprising a set of Repeat Variable         Dipeptide regions (RVDs) able to bind a nucleic acid sequence         adjacent to a nucleic acid target sequence to process wherein         each RVD comprises a pair of amino acids responsible for         recognizing one nucleotide selected from the group consisting of         HD for recognizing C, NG for recognizing T, NI for recognizing         A, NN for recognizing G or A, NS for recognizing A, C, G or T,         HG for recognizing T, IG for recognizing T, NK for recognizing         G, HA for recognizing C, ND for recognizing C, HI for         recognizing C, HN for recognizing G, NA for recognizing G, SN         for recognizing G or A and YG for recognizing T, TL for         recognizing A, VT for recognizing A or G and SW for recognizing         A.     -   (ii) A protein domain part of a catalytic entity able to process         said nucleic acid target sequence;         thereby obtaining a chimeric protein or a chimeric trimer         protein that is a comprising a catalytic entity able to process         said nucleic acid target sequence.

Said third monomer core can have the same scope of characteristics that those previously listed to describe a chimeric protein according to the present invention, regarding the origin of said core scaffold, the number of RVDs comprised in said core scaffold, the nature of those RVDs (natural, artificial or RVDs-like domains), the existence of additional N-terminus or C-terminus or both domains on this core scaffold, the existence of one or several localization signals on this core scaffold, the existence of one or several peptidic linkers on this core scaffold to fuse one or several protein domains on this core scaffold and the nature and characteristics of its protein(s) domain(s) or subdomain(s).

In another embodiment, said third monomer binds a nucleic acid sequence within the nucleic acid target sequence to process by said first and second monomers previously described. In other words, said third monomer can have a binding sequence within the spacer separating the first and a second nucleic acid sequences recognized and bound by the first and second monomers of the chimeric trimer protein according to the invention. In another embodiment, said third monomer binds a nucleic acid sequence located 5′ regarding the nucleic acid target sequence to process by said first and second monomers previously described. In another embodiment, said third monomer binds a nucleic acid sequence located 3′ regarding the nucleic acid target sequence to process by said first and second monomers previously described.

In another embodiment, said third protein subdomain can be used to measure, quantify or provoke protein-protein interactions at said nucleic target sequence according to the present invention. Said third protein subdomain can be a protein module or protein subdomain known to mediate protein-protein interaction in cell signaling. Said third protein subdomain can be used for diagnosis, analytical or therapeutic applications. Said third protein subdomain can be usable for applications such as Fluorescence Resonance Energy Transfer (FRET) as non-limiting example. In another preferred embodiment, said third protein subdomain is usable in FRET as a donor molecule, subdomains of first and second monomers being successive and compatible acceptor molecules. In another preferred embodiment, said third protein subdomain is usable in FRET as an acceptor molecule, one of the subdomains of first and second monomers being respectively successive and compatible donor and acceptor molecules. Said third protein subdomain can be coupled to a dye. In other words, first, second and third subdomains of respective monomers constituting the chimeric trimer protein according to the invention can be a successive and compatible trio of “dyes” usable in FRET applications, more specific than the usual FRET applications using two successive and compatible dyes.

In another aspect of the invention, is also encompassed a recombinant polynucleotide encoding a chimeric protein or a monomer of a chimeric dimer protein as previously described according to the present invention.

Is also encompassed a vector comprising a recombinant polynucleotide encoding a chimeric protein or a monomer of a chimeric dimer protein as previously described according to the present invention.

Is also encompassed a host cell which comprises a vector and/or a recombinant polynucleotide encoding a chimeric protein or a monomer of a chimeric dimer protein as previously described according to the present invention.

Is also encompassed in the scope of the present invention a non-human transgenic animal comprising a vector and/or a recombinant polynucleotide encoding a chimeric protein or a monomer of a chimeric dimer protein as previously described according to the present invention.

Is also encompassed in the scope of the present invention a transgenic plant comprising a vector and/or a recombinant polynucleotide encoding a chimeric protein or a monomer of a chimeric dimer protein as previously described according to the present invention.

The present invention also relates to a kit comprising a chimeric protein or a monomer of a chimeric dimer protein according to the present invention or a vector and/or a recombinant polynucleotide encoding a chimeric protein or a monomer of a chimeric dimer protein as previously described according to the present invention and instructions for use said kit.

The present invention also relates to a composition comprising a chimeric protein or a monomer of a chimeric dimer protein according to the present invention or a vector and/or a recombinant polynucleotide encoding a chimeric protein or a monomer of a chimeric dimer protein as previously described according to the present invention and a carrier. More preferably, is a pharmaceutical composition comprising a chimeric protein or a monomer of a chimeric dimer protein according to the present invention or a vector and for a recombinant polynucleotide encoding a chimeric protein or a monomer of a chimeric dimer protein as previously described according to the present invention and a pharmaceutically active carrier.

For purposes of therapy, the chimeric protein or a monomer of a chimeric dimer protein according to the present invention and a pharmaceutically acceptable excipient are administered in a therapeutically effective amount. Such a combination is said to be administered in a “therapeutically effective amount” if the amount administered is physiologically significant. An agent is physiologically significant if its presence results in a detectable change in the physiology of the recipient. In the present context, an agent is physiologically significant if its presence results in a decrease in the severity of one or more symptoms of the targeted disease and in a genome correction of the lesion or abnormality. Vectors comprising targeting DNA and/or nucleic acid encoding chimeric protein or a monomer of a chimeric dimer protein according to the present invention can be introduced into a cell by a variety of methods (e.g., injection, direct uptake, projectile bombardment, liposomes, electroporation). Chimeric proteins or monomers of chimeric dimer proteins according to the present invention can be stably or transiently expressed into cells using expression vectors. Techniques of expression in eukaryotic cells are well known to those in the art. (See Current Protocols in Human Genetics: Chapter 12 “Vectors For Gene Therapy” & Chapter 13 “Delivery Systems for Gene Therapy”).

In one further aspect of the present invention, the chimeric protein or a monomer of a chimeric dimer protein according to the present invention is substantially non-immunogenic, i.e., engender little or no adverse immunological response. A variety of methods for ameliorating or eliminating deleterious immunological reactions of this sort can be used in accordance with the invention. In a preferred embodiment, the chimeric protein or a monomer of a chimeric dimer protein according to the present invention is substantially free of N-formyl methionine. Another way to avoid unwanted immunological reactions is to conjugate the chimeric protein or a monomer of a chimeric dimer protein according to the present invention to polyethylene glycol (“PEG”) or polypropylene glycol (“PPG”) (preferably of 500 to 20,000 daltons average molecular weight (MW)). Conjugation with PEG or PPG, as described by Davis et al. (U.S. Pat. No. 4,179,337) for example, can provide non-immunogenic, physiologically active, water soluble chimeric proteins or monomers of chimeric dimer proteins conjugates with anti-viral activity. Similar methods also using a polyethylene-polypropylene glycol copolymer are described in Saifer et al. (U.S. Pat. No. 5,006,333).

The present invention also relates to methods for use of said chimeric protein or a monomer of a chimeric dimer protein according to the invention for various applications ranging from targeted DNA cleavage to targeted gene regulation. Depending on their structures and particularly the nature [transcription regulator, protein interacting with or modifying other proteins, catalytical activities such as nuclease activity (endonuclease and exonuclease), polymerase activity, kinase activity, phosphatase activity, methylase activity, topoisomerase activity, integrase activity, transposase activity, ligase activity, helicase activity, recombinase activity], the number and the combinations of several protein domains fused to said core scaffold, chimeric protein or a monomer of a chimeric dimer protein according to the present invention allow to achieve and facilitate DNA processing activities such as creating or modifying epigenetic regulatory elements, making site-specific insertions, deletions, or repairs in nucleic acid, controlling gene expression, and modifying chromatin structure, as non-limiting examples.

In a preferred embodiment, the present invention relates to a method for increasing targeted HR (and mutagenesis via NHEJ) when Double-Strand break activity is promoted in a chimeric protein or a monomer of a chimeric dimer protein according to the present invention targeting a DNA target sequence according to the invention. In another more preferred embodiment, the addition of at least two catalytically active cleavase domains according to the invention allows to increase Double-strand break-induced mutagenesis by leading to a loss of genetic information and preventing any scarless re-ligation of targeted genomic locus of interest by NHEJ.

In another preferred embodiment, the present invention relates to a method for increasing targeted HR in a more conservative fashion (with less mutagenesis via NHEJ) when Single-Strand Break activity is promoted in a chimeric protein or a monomer of a chimeric dimer protein according to the present invention targeting a DNA target sequence according to the invention.

In another preferred embodiment, the present invention relates to a method for increasing excision of a single-strand of DNA spanning the binding region of a chimeric protein or a monomer of a chimeric dimer protein according to the present invention when both one cleavase enhancer domain and one nickase enhancer domain, respectively, are fused to both N-terminus and C-terminus of at least one of the core scaffold of a chimeric protein or a monomer of a chimeric dimer protein according to the present invention.

In another preferred embodiment, the present invention relates to a method for treatment of a genetic disease caused by a mutation in a specific single double-stranded DNA target sequence in a gene, comprising administering to a subject in need thereof an effective amount of a chimeric protein or a monomer of a chimeric dimer protein, according to the present invention, a functional mutant, a variant or a derivative thereof. In a more preferred embodiment, said chimeric protein, a functional mutant, a variant or a derivative thereof for treatment of a genetic disease is independent of single-nucleotide polymorphisms (SNPs) that occur in the respective genomes of subjects in need thereof, due to TALE code degeneracy. In other words, the present invention relates to a method for treatment of a genetic disease caused by a mutation in a nucleic acid target sequence, comprising administering to a subject in need thereof, in order to cure said genetic disease, an effective amount of a chimeric protein or a monomer of a chimeric dimer protein, according to the present invention, a functional mutant, a variant or a derivative thereof wherein said chimeric protein overcomes the genomic variations of subjects due to SNPs. Said method of the present invention allows the treatment of said genetic disease by constructing and administering one unique chimeric protein according to the invention to every subjects in need thereof, whatever SNPs profiles around said mutation responsible for genetic disease in these subjects. Hence, said method of the present invention avoids the need to construct and administer one personalized chimeric protein for each subject in need thereof that takes into account each SNP profile around the mutation to cure. As non-limiting example, said unique chimeric protein to cure said genetic disease according to the present invention can comprise degenerated RVDs in its core scaffold such as NN for recognizing G or A, NS for recognizing A, C, G or T or SN for recognizing G or A. As another non-limiting example, in the case where a genomic mutation responsible for a genetic disease is closed to a G/A SNPs variation, said method of the present invention allows to treat this genetic disease by constructing and administering a unique chimeric protein according to the present invention wherein said core scaffold of said chimeric protein comprises a SN-type RVD for recognizing either G, either A, present in the genome of every subjects at SNPs location, in order to bind said genomic sequence around said SNP and treat said genetic disease. In another embodiment, said method can be used to overcome interspecies sequence variations.

In another preferred embodiment, the present invention relates to a method for inserting a transgene into a specific single double-stranded DNA target sequence of a genomic locus of a cell, tissue or non-human animal, or a plant wherein at least one chimeric protein or a monomer of a chimeric dimer protein of the present invention is transitory or not introduced into said cell, tissue, non-human animal or plant.

In another embodiment, the present invention relates to a method for optimizing the control of nucleic acid processing activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence. In a preferred embodiment, the present invention relates to a method for optimizing the control of nucleic acid processing activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating an optimum core scaffold structure to an optimum spacer length. In a preferred embodiment, the present invention relates to a method for optimizing the control of nucleic acid processing activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating an optimum C-terminal truncation of the core scaffold structure to an optimum spacer length. In a preferred embodiment, the present invention relates to a method for optimizing the control of nucleic acid processing activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating an optimum N-terminal truncation of the core scaffold structure to an optimum spacer length. In a preferred embodiment, the present invention relates to a method for optimizing the control of nucleic acid processing activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating two optimum C-terminal truncations of the core scaffold structure to an optimum spacer length as illustrated on FIGS. 10 and 11. In a preferred embodiment, the present invention relates to a method for optimizing the control of nucleic acid processing activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating two optimum N-terminal truncations of the core scaffold structure to an optimum spacer length.

In a preferred embodiment, the present invention relates to a method for optimizing the control of double-stranded break activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence. In a preferred embodiment, the present invention relates to a method for optimizing the control of double-stranded break activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating an optimum core scaffold structure to an optimum spacer length. In a preferred embodiment, the present invention relates to a method for optimizing the control of double-stranded break activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating an optimum C-terminal truncation of the core scaffold structure to an optimum spacer length. In a preferred embodiment, the present invention relates to a method for optimizing the control of double-stranded break activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating two optimum C-terminal truncations of the core scaffold structure to an optimum spacer length. In a preferred embodiment, the present invention relates to a method for optimizing the control of double-stranded break activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating two different optimum C-terminal truncations of the core scaffold structure to an optimum spacer length.

In a preferred embodiment, the present invention relates to a method for optimizing the control of double-stranded break activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating an optimum N-terminal truncation of the core scaffold structure to an optimum spacer length. In a preferred embodiment, the present invention relates to a method for optimizing the control of double-stranded break activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating two optimum N-terminal truncations of the core scaffold structure to an optimum spacer length. In a preferred embodiment, the present invention relates to a method for optimizing the control of double-stranded break activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating two different optimum N-terminal truncations of the core scaffold structure to an optimum spacer length.

In other words, the present invention relates to a method for increasing the number of targets that can be reach by a chimeric protein of the present invention, in a locus of a given genome, by associating to a spacer length imposed by the sequence of said locus of a given genome an optimal C-terminal truncation of the core scaffold of said chimeric protein as illustrated on FIG. 11.

In another embodiment, the present invention relates to a method for increasing the number of targets that can be reach by a chimeric protein of the present invention, in a locus of a given genome, by associating to a spacer length imposed by the sequence of said locus of a given genome an optimal C-terminal truncation of the core scaffold of said chimeric protein, wherein said spacer length can be comprised between 5 and 40 bp. In another embodiment, the present invention relates to a method for increasing the number of targets that can be reach by a chimeric protein of the present invention, in a locus of a given genome, by associating to a spacer length imposed by the sequence of said locus of a given genome an optimal C-terminal truncation of the core scaffold of said chimeric protein, wherein said spacer length can be comprised between 8 and 40 bp.

In another embodiment, the present invention relates to a method for increasing the number of targets that can be reach by a chimeric protein of the present invention, in a locus of a given genome, by associating to a spacer length imposed by the binding site within said locus an optimal C-terminal truncation of the core scaffold of said chimeric protein. In a preferred embodiment, the present invention relates to a method for increasing the number of targets that can be reach by a chimeric protein of the present invention, in a locus of a given genome, by associating to a spacer length imposed by the binding site within said locus, wherein said sequence of said locus is poor in T, an optimal C-terminal truncation of the core scaffold of said chimeric protein.

In a preferred embodiment, the present invention relates to a method for optimizing the control of double-stranded break location of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating two optimum C-terminal truncations of the core scaffold structure to an optimum spacer length wherein said association allows placing the cleavage site at a more desired location within said spacer as illustrated on FIG. 10. In another embodiment, said association between two optimum C-terminal truncations of the core scaffold structure and an optimum spacer length allows placing the cleavage site at a more desired location that is not in the center of said spacer as illustrated on FIGS. 10 B and C. In another embodiment, said association between two optimum C-terminal truncations of the core scaffold structure and an optimum spacer length allows placing the cleavage site at a more desired location that is in the left part (i.e 5′ located regarding the center of the spacer) of said spacer as illustrated on FIG. 10 C. In another embodiment, said association between two optimum C-terminal truncations of the core scaffold structure and an optimum spacer length allows placing the cleavage site at a more desired location that is in the right part (i.e 3′ located regarding the center of the spacer) of said spacer as illustrated on FIG. 10 B. In another embodiment, said association between two optimum C-terminal truncations of the core scaffold structure and an optimum spacer length allows placing the cleavage site at the center of said spacer as illustrated on FIG. 10 A.

In another embodiment, the present invention relates to a method for increasing the number of targets that can be reach by a chimeric protein of the present invention, in a locus of a given genome, by associating to a spacer length imposed by the sequence of said locus of a given genome an optimal N-terminal truncation of the core scaffold of said chimeric protein.

In another embodiment, the present invention relates to a method for increasing the number of targets that can be reach by a chimeric protein of the present invention, in a locus of a given genome, by associating to a spacer length imposed by the sequence of said locus of a given genome an optimal N-terminal truncation of the core scaffold of said chimeric protein, wherein said spacer length can be comprised between 5 and 40 bp. In another embodiment, the present invention relates to a method for increasing the number of targets that can be reach by a chimeric protein of the present invention, in a locus of a given genome, by associating to a spacer length imposed by the sequence of said locus of a given genome an optimal N-terminal truncation of the core scaffold of said chimeric protein, wherein said spacer length can be comprised between 8 and 40 bp.

In another embodiment, the present invention relates to a method for increasing the number of targets that can be reach by a chimeric protein of the present invention, in a locus of a given genome, by associating to a spacer length imposed by the binding site within said locus an optimal N-terminal truncation of the core scaffold of said chimeric protein. In a preferred embodiment, the present invention relates to a method for increasing the number of targets that can be reach by a chimeric protein of the present invention, in a locus of a given genome, by associating to a spacer length imposed by the binding site within said locus, wherein said sequence of said locus is poor in T, an optimal N-terminal truncation of the core scaffold of said chimeric protein.

In a preferred embodiment, the present invention relates to a method for optimizing the control of double-stranded break location of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by associating two optimum N-terminal truncations of the core scaffold structure to an optimum spacer length wherein said association allows placing the cleavage site at a more desired location within said spacer. In another embodiment, said association between two optimum N-terminal truncations of the core scaffold structure and an optimum spacer length allows placing the cleavage site at a more desired location that is not in the center of said spacer. In another embodiment, said association between two optimum N-terminal truncations of the core scaffold structure and an optimum spacer length allows placing the cleavage site at a more desired location that is in the left part (i.e 5′ located regarding the center of the spacer) of said spacer. In another embodiment, said association between two optimum N-terminal truncations of the core scaffold structure and an optimum spacer length allows placing the cleavage site at a more desired location that is in the right part (i.e 3′ located regarding the center of the spacer) of said spacer. In another embodiment, said association between two optimum N-terminal truncations of the core scaffold structure and an optimum spacer length allows placing the cleavage site at the center of said spacer.

In another embodiment, the present invention relates to a method for optimizing the control of nucleic acid processing activity, the double-stranded break activity of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence, by respectively associating two optimum N-terminal and C-terminal truncations of the core scaffold structure to an optimum spacer length.

In another embodiment, the present invention relates to a method for optimizing the control of double-stranded break location of a chimeric protein (or a monomer of a chimeric dimer protein according to the present invention) within its nucleic acid target sequence by respectively associating two optimum N-terminal and C-terminal truncations of the core scaffold structure to an optimum spacer length wherein said association allows placing the cleavage site at a more desired location within said spacer

OTHER DEFINITIONS

Amino acid residues in a polypeptide sequence are designated herein according to the one-letter code, in which, for example, Q means Gln or Glutamine residue, R means Arg or Arginine residue and D means Asp or Aspartic acid residue.

Amino acid substitution means the replacement of one amino acid residue with another, for instance the replacement of an Arginine residue with a Glutamine residue in a peptide sequence is an amino acid substitution.

DNA or nucleic acid processing activity refers to a particular/given enzymatic activity of a protein domain comprised in a chimeric protein according to the invention such as in the expression “a protein domain to process said nucleic acid target sequence”. Said DNA or nucleic acid processing activity can refer to a cleavage activity, either a cleavase activity either a nickase activity, more broadly a nuclease activity but also a polymerase activity, a kinase activity, a phosphatase activity, a methylase activity, a topoisomerase activity, an integrase activity, a transposase activity, a ligase, a helicase or recombinase activity as non-limiting examples.

Nucleotides are designated as follows: one-letter code is used for designating the base of a nucleoside: a is adenine, t is thymine, c is cytosine, and g is guanine. For the degenerated nucleotides, r represents g or a (purine nucleotides), k represents g or t, s represents g or c, w represents a or t, m represents a or c, y represents t or c (pyrimidine nucleotides), d represents g, a or t, v represents g, a or c, b represents g, t or c, h represents a, t or c, and n represents g, a, t or c.

by “variant”, “chimeric protein variant” or “TALEN variant”, it is intended a chimeric protein, a chimeric protein derived from a Transcription Activator-like Effector (TALE) or a TALEN obtained by replacement of at least one residue in the amino acid sequence of the parent chimeric protein, parent chimeric protein derived from a Transcription Activator-like Effector (TALE) or parent TALEN with at least a different amino acid.

by “peptide linker” or “peptidic linker” it is intended to mean a peptide sequence which allows the connection of different monomers or different parts comprised in a fusion protein such as between a core scaffold and a protein domain in a chimeric protein according to the present invention and which allows the adoption of a correct conformation for said fusion protein activity and/or specificity. Peptide linkers can be of various sizes, from 3 amino acids to 50 amino acids as a non limiting indicative range. Peptide linkers can also be qualified as structured or unstructured. Peptide linkers can be qualified as active linkers when they comprise active domains that are able to change their structural conformation under appropriate stimulation.

by “related to”, particularly in the expression “one cell type related to the chosen cell type or organism”, is intended a cell type or an organism sharing characteristics with said chosen cell type or said chosen organism; this cell type or organism related to the chosen cell type or organism, can be derived from said chosen cell type or organism or not.

by “subdomain” it is intended a protein subdomain or a protein part that interacts with another protein subdomain or protein part to form an active entity and/or a catalytic active entity possibly bearing nucleic acid or DNA processing activity of said chimeric protein according to the invention.

by “targeting DNA construct/minimal repair matrix/repair matrix” it is intended to mean a DNA construct comprising a first and second portion that are homologous to regions 5′ and 3′ of the DNA target in situ. The DNA construct also comprises a third portion positioned between the first and second portion which comprise some homology with the corresponding DNA sequence in situ or alternatively comprise no homology with the regions 5′ and 3′ of the DNA target in situ. Following cleavage of the DNA target, a homologous recombination event is stimulated between the genome containing the targeted gene comprised in the locus of interest and the repair matrix, wherein the genomic sequence containing the DNA target is replaced by the third portion of the repair matrix and a variable part of the first and second portions of the repair matrix.

by “functional mutant” is intended a catalytically active mutant of a protein or a protein domain; such mutant can have the same activity compared to its parent protein or protein domain or additional properties. This definition applies to chimeric proteins or protein domains that constitute chimeric proteins according to the present invention. Are also encompassed in the scope of this definition “derivatives” of these proteins or protein domains that comprise the entirety or part of these proteins or protein domains fused to other proteic or chemical parts such as tags, antibodies, polyethylene glycol as non-limiting examples.

The expression “single polypeptide chain” is used to qualify a chimeric protein according to the invention which functions as a dimer wherein one first monomer and one second monomer are fused by a peptidic linker.

by “DNA target”, “DNA target sequence”, “target DNA sequence”, “nucleic acid target sequence”, “target sequence”, or “processing site” is intended a polynucleotide sequence that can be processed by a chimeric protein according to the present invention. These terms refer to a specific DNA location, preferably a genomic location in a cell, but also a portion of genetic material that can exist independently to the main body of genetic material such as plasmids, episomes, virus, transposons or in organelles such as mitochondria or chloroplasts as non-limiting examples. The nucleic acid target sequence is defined by the 5′ to 3′ sequence of one strand of said target, as indicate above for Avr15 (SEQ ID NO: 6) as a non-limiting example.

Adjacent is used to distinguish between 1) the nucleic acid sequence recognized and bound by a set of specific RVDs comprised in the core scaffold of said chimeric protein according to the invention and 2) the nucleic acid target sequence to be processed by said chimeric protein according to the invention, said nucleic sequences 1) and 2) being adjacent. When said chimeric protein according to the invention functions as a dimer comprising respectively two monomers, the term adjacent is equally used to qualify the nucleic acid target sequence regarding the first nucleic acid sequence recognized and bound by the first monomer and to qualify the nucleic acid target sequence regarding the second nucleic acid sequence recognized and bound by the second monomer. By the nucleic acid sequence adjacent to the nucleic acid target sequence is meant the recognition/binding site of said chimeric protein according to the invention.

By “delivery vector” or “delivery vectors” is intended any delivery vector which can be used in the present invention to put into cell contact (i.e “contacting”) or deliver inside cells or subcellular compartments agents/chemicals and molecules (proteins or nucleic acids) needed in the present invention. It includes, but is not limited to liposomal delivery vectors, viral delivery vectors, drug delivery vectors, chemical carriers, polymeric carriers, lipoplexes, polyplexes, dendrimers, microbubbles (ultrasound contrast agents), nanoparticles, emulsions or other appropriate transfer vectors. These delivery vectors allow delivery of molecules, chemicals, macromolecules (genes, proteins), or other vectors such as plasmids, peptides developed by Diatos. In these cases, delivery vectors are molecule carriers. By “delivery vector” or “delivery vectors” is also intended delivery methods to perform transfection.

The terms “vector” or “vectors” refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. A “vector” in the present invention includes, but is not limited to, a viral vector, a plasmid, a RNA vector or a linear or circular DNA or RNA molecule which may consists of a chromosomal, non chromosomal, semi-synthetic or synthetic nucleic acids. Preferred vectors are those capable of autonomous replication (episomal vector) and/or expression of nucleic acids to which they are linked (expression vectors). Large numbers of suitable vectors are known to those of skill in the art and commercially available.

Viral vectors include retrovirus, adenovirus, parvovirus (e.g. adenoassociated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e.g., influenza virus), rhabdovirus (e.g., rabies and vesicular stomatitis virus), paramyxovirus (e.g. measles and Sendai), positive strand RNA viruses such as picornavirus and alphavirus, and double-stranded DNA viruses including adenovirus, herpesvirus (e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomega-lovirus), and poxvirus (e.g., vaccinia, fowlpox and canarypox). Other viruses include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example. Examples of retroviruses include: avian leukosis-sarcoma, mammalian C-type, B-type viruses, D type viruses, HTLV-BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental Virology, Third Edition, B. N. Fields, et al., Eds., Lippincott-Raven Publishers, Philadelphia, 1996).

By “lentiviral vector” is meant HIV-Based lentiviral vectors that are very promising for gene delivery because of their relatively large packaging capacity, reduced immunogenicity and their ability to stably transduce with high efficiency a large range of different cell types. Lentiviral vectors are usually generated following transient transfection of three (packaging, envelope and transfer) or more plasmids into producer cells. Like HIV, lentiviral vectors enter the target cell through the interaction of viral surface glycoproteins with receptors on the cell surface. On entry, the viral RNA undergoes reverse transcription, which is mediated by the viral reverse transcriptase complex. The product of reverse transcription is a double-stranded linear viral DNA, which is the substrate for viral integration in the DNA of infected cells.

By “integrative lentiviral vectors (or LV)”, is meant such vectors as non limiting example, that are able to integrate the genome of a target cell.

At the opposite by “non integrative lentiviral vectors (or NILV)” is meant efficient gene delivery vectors that do not integrate the genome of a target cell through the action of the virus integrase.

One type of preferred vector is an episome, i.e., a nucleic acid capable of extra-chromosomal replication. Preferred vectors are those capable of autonomous replication and/or expression of nucleic acids to which they are linked. Vectors capable of directing the expression of genes to which they are operatively linked are referred to herein as “expression vectors. A vector according to the present invention comprises, but is not limited to, a YAC (yeast artificial chromosome), a BAC (bacterial artificial), a baculovirus vector, a phage, a phagemid, a cosmid, a viral vector, a plasmid, a RNA vector or a linear or circular DNA or RNA molecule which may consist of chromosomal, non chromosomal, semi-synthetic or synthetic DNA. In general, expression vectors of utility in recombinant DNA techniques are often in the form of “plasmids” which refer generally to circular double stranded DNA loops which, in their vector form are not bound to the chromosome. Large numbers of suitable vectors are known to those of skill in the art. Vectors can comprise selectable markers, for example: neomycin phosphotransferase, histidinol dehydrogenase, dihydrofolate reductase, hygromycin phosphotransferase, herpes simplex virus thymidine kinase, adenosine deaminase, glutamine synthetase, and hypoxanthine-guanine phosphoribosyl transferase for eukaryotic cell culture; TRP1 for S. cerevisiae; tetracyclin, rifampicin or ampicillin resistance in E. coli. Preferably said vectors are expression vectors, wherein a sequence encoding a polypeptide of interest is placed under control of appropriate transcriptional and translational control elements to permit production or synthesis of said polypeptide. Therefore, said polynucleotide is comprised in an expression cassette. More particularly, the vector comprises a replication origin, a promoter operatively linked to said encoding polynucleotide, a ribosome binding site, a RNA-splicing site (when genomic DNA is used), a polyadenylation site and a transcription termination site. It also can comprise an enhancer or silencer elements. Selection of the promoter will depend upon the cell in which the polypeptide is expressed. Suitable promoters include tissue specific and/or inducible promoters. Examples of inducible promoters are: eukaryotic metallothionine promoter which is induced by increased levels of heavy metals, prokaryotic lacZ promoter which is induced in response to isopropyl-β-D-thiogalacto-pyranoside (IPTG) and eukaryotic heat shock promoter which is induced by increased temperature. Examples of tissue specific promoters are skeletal muscle creatine kinase, prostate-specific antigen (PSA), α-antitrypsin protease, human surfactant (SP) A and B proteins, β-casein and acidic whey protein genes.

Inducible promoters may be induced by pathogens or stress, more preferably by stress like cold, heat, UV light, or high ionic concentrations (reviewed in Potenza C et al. 2004, In vitro Cell Dev Biol 40:1-22). Inducible promoter may be induced by chemicals (reviewed in (Moore, Samalova et al. 2006); (Padidam 2003); (Wang, Zhou et al. 2003); (Zuo and Chua 2000).

Delivery vectors and vectors can be associated or combined with any cellular permeabilization techniques such as sonoporation or electroporation or derivatives of these techniques.

By cell or cells is intended any prokaryotic or eukaryotic living cells, cell lines derived from these organisms for in vitro cultures, primary cells from animal or plant origin.

By “primary cell” or “primary cells” are intended cells taken directly from living tissue (i.e. biopsy material) and established for growth in vitro, that have undergone very few population doublings and are therefore more representative of the main functional components and characteristics of tissues from which they are derived from, in comparison to continuous tumorigenic or artificially immortalized cell lines. These cells thus represent a more valuable model to the in vivo state they refer to.

In the frame of the present invention, “eukaryotic cells” refer to a fungal, plant or animal cell or a cell line derived from the organisms listed below and established for in vitro culture. More preferably, the fungus is of the genus Aspergillus, Penicillium, Acremonium, Trichoderma, Chrysoporium, Mortierella, Kluyveromyces or Pichia; More preferably, the fungus is of the species Aspergillus niger, Aspergillus nidulans, Aspergillus oryzae, Aspergillus terreus, Penicillium chrysogenum, Penicillium citrinum, Acremonium Chrysogenum, Trichoderma reesei, Mortierella alpine, Chrysosporium lucknowense, Kluyveromyces lactis, Pichia pastoris or Pichia ciferrii.

More preferably the plant is of the genus Arabidospis, Nicotiana, Solanum, lactuca, Brassica, Oryza, Asparagus, Pisum, Medicago, Zea, Hordeum, Secale, Triticum, Capsicum, Cucumis, Cucurbita, Citrullis, Citrus, Sorghum; More preferably, the plant is of the species Arabidospis thaliana, Nicotiana tabaccum, Solanum lycopersicum, Solanum tuberosum, Solanum melongena, Solanum esculentum, Lactuca saliva, Brassica napus, Brassica oleracea, Brassica rapa, Oryza glaberrima, Oryza sativa, Asparagus officinalis, Pisum sativum, Medicago sativa, zea mays, Hordeum vulgare, Secale cereal, Triticum aestivum, Triticum durum, Capsicum sativus, Cucurbita pepo, Citrullus lanatus, Cucumis melo, Citrus aurantifolia, Citrus maxima, Citrus medica, Citrus reticulata.

More preferably the animal cell is of the genus Homo, Rattus, Mus, Sus, Bos, Danio, Canis, Felis, Equus, Salmo, Oncorhynchus, Gallus, Meleagris, Drosophila, Caenorhabditis; more preferably, the animal cell is of the species Homo sapiens, Rattus norvegicus, Mus musculus, Sus scrofa, Bos taurus, Danio rerio, Canis lupus, Felis catus, Equus caballus, Salmo salar, Oncorhynchus mykiss, Gallus gallus, Meleagris gallopavo, Drosophila melanogaster, Caenorhabditis elegans.

In the present invention, the cell can be a plant cell, a mammalian cell, a fish cell, an insect cell or cell lines derived from these organisms for in vitro cultures or primary cells taken directly from living tissue and established for in vitro culture. As non limiting examples cell lines can be selected from the group consisting of CHO-K1 cells; HEK293 cells; Caco2 cells; U2-OS cells; NIH 3T3 cells; NSO cells; SP2 cells; CHO-S cells; DG44 cells; K-562 cells, U-937 cells; MRC5 cells; IMR90 cells; Jurkat cells; HepG2 cells; HeLa cells; HT-1080 cells; HCT-116 cells; Hu-h7 cells; Huvec cells; Molt 4 cells.

All these cell lines can be modified by the method of the present invention to provide cell line models to produce, express, quantify, detect, study a gene or a protein of interest; these models can also be used to screen biologically active molecules of interest in research and production and various fields such as chemical, biofuels, therapeutics and agronomy as non-limiting examples.

by “homologous” is intended a sequence with enough identity to another one to lead to homologous recombination between sequences, more particularly having at least 95% identity, preferably 97% identity and more preferably 99%.

“identity” refers to sequence identity between two nucleic acid molecules or polypeptides. Identity can be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When a position in the compared sequence is occupied by the same base, then the molecules are identical at that position. A degree of similarity or identity between nucleic acid or amino acid sequences is a function of the number of identical or matching nucleotides at positions shared by the nucleic acid sequences. Various alignment algorithms and/or programs may be used to calculate the identity between two sequences, including FASTA, or BLAST which are available as a part of the GCG sequence analysis package (University of Wisconsin, Madison, Wis.), and can be used with, e.g., default setting.

by “mutation” is intended the substitution, deletion, insertion of one or more nucleotides/amino acids in a polynucleotide (cDNA, gene) or a polypeptide sequence. Said mutation can affect the coding sequence of a gene or its regulatory sequence. It may also affect the structure of the genomic sequence or the structure/stability of the encoded mRNA.

In the frame of the present invention, the expression “double-strand break-induced mutagenesis” (DSB-induced mutagenesis) refers to a mutagenesis event consecutive to an NHEJ event following an endonuclease-induced DSB, leading to insertion/deletion at the cleavage site of an endonuclease.

By “gene” is meant the basic unit of heredity, consisting of a segment of DNA arranged in a linear manner along a chromosome, which codes for a specific protein or segment of protein. A gene typically includes a promoter, a 5′ untranslated region, one or more coding sequences (exons), optionally introns, a 3′ untranslated region. The gene may further comprise a terminator, enhancers and/or silencers.

As used herein, the term “transgene” refers to a sequence encoding a polypeptide. Preferably, the polypeptide encoded by the transgene is either not expressed, or expressed but not biologically active, in the cell, tissue or individual in which the transgene is inserted. Most preferably, the transgene encodes a therapeutic polypeptide useful for the treatment of an individual.

The term “gene of interest” or “GOI” refers to any nucleotide sequence encoding a known or putative gene product.

As used herein, the term “locus” is the specific physical location of a DNA sequence (e.g. of a gene) on a chromosome. The term “locus” usually refers to the specific physical location of a chimeric protein's nucleic target sequence on a chromosome. Such a locus, which comprises a target sequence that is recognized and cleaved by a chimeric protein according to the invention, is referred to as “locus according to the invention”. Also, the expression “genomic locus of interest” is used to qualify a nucleic acid sequence in a genome that can be a putative target for a double-strand break according to the invention. It is understood that the considered genomic locus of interest of the present invention can not only qualify a nucleic acid sequence that exists in the main body of genetic material (i.e. in a chromosome) of a cell but also a portion of genetic material that can exist independently to said main body of genetic material such as plasmids, episomes, virus, transposons or in organelles such as mitochondria or chloroplasts as non-limiting examples.

By the expression “loss of genetic information” is understood the elimination or addition of at least one given DNA fragment (at least one nucleotide) or sequence within the intervening sequence between at least two processing sites of the chimeric protein of the present invention or between two chimeric proteins according to the present invention. This loss of genetic information can be, as a non-limiting example, the elimination of an intervening sequence between two processing sites of two chimeric proteins according to the present invention. As another non-limiting example, this loss of genetic information can also be an excision of a single-strand of DNA spanning the binding region of a chimeric protein according to the present invention

By “scarless re-ligation” or “scarless religation” is intended the perfect re-ligation event, without loss of genetic information (no insertion/deletion events) of the DNA broken ends through NHEJ process after the creation of a double-strand break event.

By “Imprecise NHEJ” is intended the re-ligation of nucleic acid ends generated by a DSB, with insertions or deletions of nucleotides. Imprecise NHEJ is an outcome and not a repair pathway and can result from different NHEJ pathways (Ku dependent or Ku independent as non-limiting examples).

By “fusion protein” is intended the result of a well-known process in the art consisting in the joining of two or more genes which originally encode for separate proteins or part of them, the translation of said “fusion gene” resulting in a single polypeptide with functional properties derived from each of the original proteins.

By “chimeric protein” according to the present invention is meant any fusion protein comprising a core scaffold comprising a set of RVDs to bind a nucleic acid sequence and one protein domain to process a nucleic acid target sequence adjacent to said bound nucleic acid sequence. Said chimeric protein according to the present invention can function as a dimer wherein each monomer (a monomer of a chimeric dimer protein in this case) constituting said chimeric dimer protein comprises a set of RVDs to bind a nucleic acid sequence and one protein domain to process a nucleic acid target sequence adjacent to said bound nucleic acid sequence.

By “protein domain” or “catalytic domain” is meant the nucleic acid target sequence processing part of said chimeric protein according to the present invention. Said protein domain or catalytic domain can provide any catalytical activity as classified and named according to the reaction they catalyze [Enzyme Commission number (EC number) at http://www.chem.qmul.ac.uk/iubmb/enzyme/)]. Said protein domain or catalytic domain can be a catalytically active entity by itself. Said protein domain or catalytic domain can be a protein subdomain that needs to interact with another protein subdomain to form a dimeric protein domain active entity. From a chimeric dimer protein point of view according to the present invention, said protein domain or catalytic domain can be a first protein subdomain interacting with a second protein subdomain of another chimeric monomer protein according to the invention to form the catalytically active protein entity able to process the nucleic acid target sequence.

By a “TALE-nuclease” (TALEN) is intended a fusion protein consisting of a DNA-binding domain derived from a Transcription Activator Like Effector (TALE) and one nuclease catalytic domain to cleave a nucleic acid target sequence. Said TALEN is a subclass of chimeric protein according to the present invention.

By spacer is meant the nucleic acid area that separates the two nucleic acid sequences recognized and bound by each monomer constituting a chimeric dimer protein according to the invention. By spacer length is meant the nucleic acid distance that separates the two nucleic acid sequences recognized and bound by each monomer constituting a chimeric dimer protein according to the invention. According to the present invention, said nucleic acid target sequence of the chimeric protein according to the present invention can be encompassed in said spacer. Said nucleic acid target sequence of the chimeric protein according to the present invention can be identical to said spacer. Said nucleic acid target sequence of the chimeric protein according to the present invention can be different of said spacer.

The above written description of the invention provides a manner and process of making and using it such that any person skilled in this art is enabled to make and use the same, this enablement being provided in particular for the subject matter of the appended claims, which make up a part of the original description.

As used above, the phrases “selected from the group consisting of,” “chosen from,” and the like include mixtures of the specified materials.

Where a numerical limit or range is stated herein, the endpoints are included. Also, all values and subranges within a numerical limit or range are specifically included as if explicitly written out.

The above description is presented to enable a person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the preferred embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. Thus, this invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

Having generally described this invention, a further understanding can be obtained by reference to certain specific examples, which are provided herein for purposes of illustration only, and are not intended to be limiting unless otherwise specified.

EXAMPLES Example 1 TAL Nuclease (TALEN) Activities in Yeast and Mammalian Cells

1.A: Activity in Yeast

The amino acid sequences of the N-terminal, C-terminal domains and RVDS were based on the AvrBs3 TAL (ref: GenBank: X16130.1, SEQ ID NO: 1).

The DNA encoding the N-terminal domain [referred as Nter wt or WT Nter (SEQ ID NO: 292), i.e. corresponding to the N terminal domain of natural AvrBs3 (SEQ ID NO: 1) except an Ala residue in position 2], the C-terminal domain [referred as Cter wt or WT Cter (SEQ ID NO: 400) lacking the activation domain of the C-terminal domain of natural AvrBs3 (SEQ ID NO: 1)] and the nuclease catalytic head were synthesized (TopGene Technologies) and subcloned into the pCLS0542 (SEQ ID NO: 2) yeast expression plasmid, using NcoI and EagI restriction enzymes, leading to the backbone plasmid pCLS7183 (referred as backbone wt, SEQ ID NO: 3). The C-terminal and the N-terminal domains are separated by two BsmBI restriction sites. The AvrBs3-derived set of repeat domains (RVDs) (SEQ ID NO: 4) was subcloned in the pCLS7183 using type IIs restriction enzymes BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence, leading to pCLS7184 and subsequent AvrBs3-derived TALEN (referred as control wt, SEQ ID NO: 5). All the yeast target reporter plasmids containing the TALEN DNA target sequences were constructed as previously described (International PCT Applications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006). The AvrBs3-derived TALEN was tested at 30° C. in our yeast SSA assay previously described (International PCT Applications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) as homodimer (two identical recognition sequences are placed facing each other on both DNA strands) on the target Avr15 (SEQ ID NO: 6, Table 3). TALEN cleavage activity levels on its respective target in yeast are shown on FIG. 2.

1.B: Activity in Mammalian Cells (CHO-K1)

The DNA encoding a HA tag, the N-terminal domain (referred as Nter wt), the C-terminal domain (referred as Cter wt) and the nuclease catalytic head were synthesized and subcloned into the pCLS1853 (SEQ ID NO: 7) mammalian expression plasmid, using AscI and XhoI restriction enzymes, leading to the backbone plasmid pCLS7111 (SEQ ID NO: 8). The C-terminal and the N-terminal domains are separated by two BsmBI restriction sites. The AvrBs3-derived set of repeat domains (RVDs) (SEQ ID NO: 4) was subcloned in the pCLS7111 using type IIs restriction enzymes BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence, leading to pCLS7509 and subsequent TALEN (SEQ ID NO: 9).

All the mammalian target reporter plasmids containing the TALEN DNA target sequences were constructed using standard gateway Gateway protocol (INVITROGEN) into a CHO reporter vector (Grizot, Epinat et al.; Arnould, Chames et al. 2006). Activity of AvrBs3-derived TALEN was tested in our extrachromosomal assay in mammalian cells (CHO K1) as homodimer (two identical recognition sequences are placed facing each other on both DNA strands) on the target Avr15 (SEQ ID NO: 6, Table 3). For this assay, CHO K1 cells were transfected in a 96-well plate format with 75 or 200 ng of target vector and an increasing quantity of each variant DNA from 0.7 to 25 ng, in the presence of PolyFect reagent (1 μL per well). The total amount of transfected DNA was completed to 125 or 250 ng (target DNA, variant DNA, carrier DNA) using an empty vector. 72 hours after transfection, culture medium was removed and 150 μl of lysis/revelation buffer for β-galactosidase liquid assay was added. After incubation at 37° C., OD was measured at 420 nm. The entire process is performed on an automated Velocity11 BioCel platform (Grizot, Epinat et al.).

TALEN cleavage activity levels on their respective targets in mammalian cells are shown on FIG. 3.

Example 2 Engineering of the N-Terminal Domain

2.A: Rational Truncation of the N-Terminal Domain

Truncations of the first 153 (numbering based on SEQ ID NO:1)_amino acids residues of the N-terminal domain of the AvrBs3-derived TALEN (pCLS7184, SEQ ID NO: 5) were realized. DNA sequence corresponding to amino acids D154 to N228 was amplified by PCR, using the backbone plasmid pCLS7183 (referred as backbone wt, SEQ ID NO: 3) as template, to add a NcoI restriction site in 5′ and a XmnI restriction site in 3′. The PCR construct was subcloned in the TALEN yeast expression backbone (pCLS7183, SEQ ID NO: 3) to replace the sequence of the N-terminal domain (referred as Nter wt) comprised between the NcoI and XmnI restriction sites, leading to pCLS7724 (SEQ ID NO: 10). All DNA sequences were validated by sequencing. The AvrBs3-derived set of repeat domains (RVDs) (SEQ ID NO: 4) was subcloned in the pCLS7724 using type IIs restriction enzymes BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence, leading to pCLS7725 and subsequent TALEN (SEQ ID NO: 11). This truncated variant was screened in our yeast SSA assay (see Example 1) as homodimer (two identical recognition sequences are placed facing each other on both DNA strands) on the target Avr15 (SEQ ID NO: 6, Table 3). Activity level of the truncated variant is shown on FIG. 4.

2.B: Random Truncation of the N-Terminal Domain

Incremental truncation of the DNA coding for the N-terminal domain of the AvrBs3-derived TALEN (pCLS7184, SEQ ID NO: 5), starting from the 5′ of the coding strand, allows the identification of minimal N-terminal domains that could still lead active TALEN. Experimentally, the complete sequence of the N-terminal domain of AvrBs3 is amplified by PCR. Restriction site BsmBI, XmnI and SacI or AatII are introduced, in this specific order at the 3′ of the coding strand. After digestion with SacI or AatII, incremental truncation of the 5′ is performed using a 3′→5′ exonuclease III. The 3′ overhang created by the SacI or AatII digestion being protected from the exonuclase III digestion, unidirectional deletions are performed by controlling the digestion time and reaction conditions and an homogenous distribution of DNA fragments size (library) is obtained. Resulting DNA products (library) are blunt-ended, digested by XmnI and subcloned in a TALEN yeast expression backbone (pCLS7183, SEQ ID NO: 3) to replace the sequence of the full N-terminal domain. The AvrBs3-derived set of repeat domains (RVDs) (SEQ ID NO: 4) or any other previously synthesized RVD sequence is subcloned in the previously constructed plasmids using type IIs restriction enzymes BsmBI for the receiving plasmid. All created constructions are screened in our yeast SSA assay (see Example 1) for activity and specificity toward the AvrBs3 target or any other targets.

A particular truncated variant is judged useful if it provides, a minimal 5% retention in activity of the starting AvrBs3-derived TALEN (SEQ ID NO: 5) on its Avr15 target (SEQ ID NO: 6), more preferably a minimal 10% retention, more preferably 20%, more preferably 30%, more preferably 40%, more preferably 50%, again more preferably a retention in activity greater than 50%.

In addition, a particular variant is judged useful if it provides, on any targets having a C, G or A at position 0, a minimal 5% retention in activity of the starting AvrBs3-derived TALEN (SEQ ID NO: 5) on its Avr15 target (SEQ ID NO: 6), more preferably a minimal 10% retention, more preferably 20%, more preferably 30%, more preferably 40%, more preferably 50%, again more preferably a retention in activity greater than 50%.

2.C: Engineering of the N-Terminal Domain for Specific Recognition of the Base 0 or RVD0 of the Target

Sequence and structure-based homology modelings of the C-terminal part of the N-terminal domain have pinpointed positions involved in the specific requisite of a T at position 0 of the target. Different sets of experiments are realized to overcome this limitation. In a first set of experiments, variants of the C-terminal part of the N-terminal domain are constructed to replace either the amino acids K265 and R266 by NN or SN or SNN or the amino acid R266 by N or NN. In a second set of experiments, the codons coding for these two positions are fully randomized by either two or three codons. In a third set of experiments, the complete C-terminal part of the N-terminal domain (amino acids Leu255 to Asn288) is replaced by a full RVD. All these experimental procedures are realized by using site directed mutagenesis techniques and/or combination of PCR and restriction ligation techniques well known in the art. These variants are screened for activity and specificity toward the base at position 0 (A, T, C or G) in our yeast assay (see example 1).

A particular variant is judged useful if it provides, on any or all targets, a minimal 5% retention in activity of the starting AvrBs3-derived TALEN (SEQ ID NO: 5) on its Avr15 target (SEQ ID NO: 6), more preferably a minimal 10% retention, more preferably 20%, more preferably 30%, more preferably 40%, more preferably 50%, again more preferably a retention in activity greater than 50%.

Example 3 Engineering of the C-Terminal Domain

3.A: Rational Truncation of the C-Terminal Domain

DNA sequences corresponding to truncations (numbering based on SEQ ID NO:1) after positions E886 (C0), P897 (C11), G914 (C28), L926 (C40), D950 (C64), R1000 (C115) and D1059 (C172) were amplified by PCR, using the backbone plasmid pCLS7183 (referred as backbone wt, SEQ ID NO: 3) as template, to add a XmnI restriction site in 5′ and a BamHI restriction site in 3′ (Protein sequences of truncated C-terminal domains C11 to C172 are respectively given in SEQ ID NO: 295 to 300). The PCR constructs were subcloned in the TALEN yeast expression backbone (pCLS7183, SEQ ID NO: 3) to replace the sequence of the full C-terminal domain, leading to pCLS7820, pCLS7802, pCLS7806, pCLS7808, pCLS7810, pCLS7812, pCLS7816 (SEQ ID NO: 12 to 18). All DNA sequences were validated by sequencing. The AvrBs3-derived set of repeat domains (RVDs) (SEQ ID NO: 4) was subcloned in pCLS7820, pCLS7802, pCLS7806, pCLS7808, pCLS7810, pCLS7812, pCLS7816 (SEQ ID NO: 12 to 18) and using type IIs restriction enzymes (BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence) leading to pCLS7821, pCLS7803, pCLS7807, pCLS7809, pCLS7811, pCLS7813, pCLS7817 and subsequent TALENs (SEQ ID NO: 19 to 25). These truncated variants were screened in our yeast SSA assay as homodimers (two identical recognition sequences are placed facing each other on both DNA strands) on the target Avr15 (SEQ ID NO: 6, Table 3) and activity levels are shown on FIG. 5.

In addition 27 custom TALENs were tested as homodimers, in four different scaffolds (C0 truncated C-terminal domain, SEQ ID NO: 26 to 52; C11 truncated C-terminal domain, SEQ ID NO: 53 to 79; C40 truncated C-terminal domain, SEQ ID NO: 80 to 106 and full wt C-terminal domain, SEQ ID NO: 107 to 133; respective nucleic target sequences with bound nucleic acid sequences are given in table 7, SEQ ID NO: 193 to SEQ ID NO: 219). The activity of nearly all of these TALEN was increased by using truncated scaffold compared to the full C-terminal domain, also showing that respective truncation effects are not exclusive of a specific RVD sequence (Tables 6 and 7).

3.B: Lack of Specificity of the Last Terminal Half RVD

TAL effectors possess a characteristic truncated RVD (the so-called half repeat) at the end (C-terminus) of the repeat region. This half repeat is supposed to target specifically the last base of the target sequenced. To assess this specificity, an Avrbs3-derived TALEN (pCLS7184, SEQ ID NO: 5) was screened, in our yeast assay, for activity on four identical targets except for the last base (A, T, G or C, in the n position, SEQ ID NO: 171 to 174, Table 4). No significant differences in activity were observed on the four targets as shown on FIG. 7 indicating the lack of specificity of the last half RVD in this TALEN context.

3.C: Replacement of the C-Terminal Domain by a Polypeptide Linker

We generated a first library of 37 different linkers. Many of them have a common structure comprising a variable region encoding 3 to 28 amino acids residues and flanked by regions encoding SGGSGS stretch at both the 5′ and a 3′ end (SEQ ID NO: 134 to 170 and SEQ ID: 403 to 439). These linkers contain XmaI and BamHI restriction sites in their 5′ and 3′ ends respectively. The linker library is then subcloned in pCLS7183 (SEQ ID NO: 3) via the XmaI and BamHI restriction sites to replace the C-terminal domain of the AvrBs3-derived TALEN (pCLS7184, SEQ ID NO: 5). The AvrBs3-derived set of repeat domains (RVDs) or any other RVD sequences having or lacking the terminal half RVD is cloned in this backbone library plasmid as described in Example 3A and resulting clones are screened in our yeast SSA assay (see Example 1).

A particular polypeptide linker domain is judged useful if it provides a minimal 5% retention in activity of the starting AvrBs3-derived TALEN (SEQ ID NO: 5) on its Avr15 target (SEQ ID NO: 6), more preferably a minimal 10% retention, more preferably 20%, more preferably 30%, more preferably 40%, more preferably 50%, again more preferably a retention in activity greater than 50%.

The DNA (so called “polypeptide linker”) coding for, seven different polypeptides (SEQ ID NO: 479 to 485) were prepared by PCR using standard molecular biology procedures. These linkers contain, at the DNA level, a XmaI and a BamHI restriction sites in their 5′ and 3′ ends respectively. These seven linkers were then subcloned individually into a XmaI and BamHI pre-digested backbone pCLS9943 (SEQ ID NO: 486) via the XmaI and BamHI restriction sites to create a new C-terminal domain linker scaffold (pCLS12233 to 12238 and pCLS12270, SEQ ID NO: 487 to 493). This backbone, pCLS9943, contains an additional N-terminal NLS sequence followed by an HA tag and a C11 truncated C-terminal domain compared to the original pCLS7183. The RVD arrays coding for RAGT2.3 (SEQ ID NO: 494) were subcloned individually in the pCLS12233 to pCLS12237 and pCLS12270 using type IIs restriction enzymes BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence, leading to the seven constructs, pCLS12945 to pCLS12951 (SEQ ID NO: 495 to 501).

The resulting constructs were screened in our yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on RAGT2.3 pseudo palindromic target (SEQ ID NO214, Table 8). Activity of all seven constructs on RAGT2.3 pseudo palindromic target are summarized in FIG. 22.

Among the 37 different linkers tested, it was found that Linkers #8, #27 and #35 (SEQ ID NO: 141, 160 and 168 respectively) showed significant activity toward AvrBs3 target in our yeast assay (FIG. 26).

Example 4 Effect of the Spacer Length on Engineered TALENs

All truncated (and non-truncated) variants of the Avrbs3-derived TALENs previously described (Example 3A) were screened in the yeast assay as homodimer on targets having spacers varying in length from 5 bps to 40 bps (SEQ ID NO: 220 to 255, Table 3) in order to identify spacer lengths that enable the most efficient activity. Resulting activities of the TALENs were clearly dependent on the spacer length and on the truncation of the C-terminal domain (FIGS. 6 A to G).

Activity is detected for targets with a spacer ranging from 15 to 30 bp for the TALEN having the full C-terminal domain, activity is detected for targets with a spacer ranging from 8 to 30 bp for the TALEN having the truncation C0 or C11, activity is detected for targets with a spacer ranging from 9 bp to 30 bp for the TALEN having the truncation C28, activity is detected for targets with a spacer ranging from 11 to 30 bp for the TALEN having the truncation C40, activity is detected for targets with a spacer ranging from 12 bp to 30 bp for the TALEN having the truncation C64 and C115 and activity is detected for targets with a spacer ranging from 13 to 30 bp for the TALEN having the truncation C172.

Although a detectable activity was observed on the majority of the described targets (SEQ ID NO: 220 to 255, Table 3), an observed bias towards smaller spacer for some specific truncations (e.g. C0 and C11) associated with a bias toward longer spacer for other specific truncations (e.g. C117) allows determination of the optimal C-terminal truncation of the core scaffold toward a target with specific spacer length, thus also increasing the reachable sequence space of possible targets.

Example 5 Effect of Asymmetric C-Terminal Truncations on Activity Versus Spacer Length

Since heterodimeric TALEN requires such large pseudo palindromic binding sites, such sequences are unlikely to be naturally present in genomic target. Relationship between spacer length and C-terminal truncations has been studied. In such a case of heterodimeric targets, C-terminal truncations on both TALENs do not require being identical and asymmetrical truncations have been tested. All combinations of C-terminal truncated variants (prepared as described in example 3A) from the two distinct parent TALENs, respectively AvrBs3-derived (SEQ ID NO: 5) and RAGT2R (SEQ ID NO: 127) were co-transformed and tested in our yeast SSA assay (see Example 1) on 36 heterodimeric targets (SEQ ID NO: 256 to 291, Table 5).

Results show that different truncations can be associated in the same TALEN, modulating the activity over the spacer length space. Such kinds of architectures (asymmetrical) direct more precisely the cleavage on the target spacer, either on the left part or on the right part or on center part. In other words, such results allow optimizing the control of double-stranded break localization of a TALEN within its nucleic acid target sequence by associating two optimum C-terminal truncations of the core scaffold structure to an optimum spacer length wherein said association allows placing the cleavage site at a more desired location within said spacer. Said association between two optimum C-terminal truncations of the core scaffold structure and an optimum spacer length allows placing the cleavage site at a more desired location that is in the center of said spacer or not i.e either in the left part (5′ located regarding the center of the spacer) of said spacer, either in the right part (3′ located regarding the center of the spacer) of said spacer.

Activities of asymmetrical C-terminal truncated TALEN variants are provided in FIGS. 16 to 19. TALEN couples are combinations of AvrBs3-derived (SEQ ID NO: 5) and RAGT2R (SEQ ID NO: 127) parent TALEN containing respectively C0, C11, C40, C117 and Cter WT C-terminal domains for AvrBs3-derived constructs and C0, C11, C40 and Cter WT C-terminal domains for RAGT2R constructs.

Example 6 Activity of C-Terminal Truncated TALEN in Mammalian Cells

The DNA encoding a nuclear localization sequence NLS, either a HA tag or a S tag, the N-terminal domain, the C11 truncated C-terminal domain and a nuclease catalytic head was synthesized (TopGene Technologies) and subcloned into the pCLS1853 (SEQ ID NO: 7) mammalian expression plasmid, using AscI and XhoI restriction enzymes, leading to the backbone plasmids pCLS8425 (HA tag) and pCLS8429 (S tag) (SEQ ID NO: 175 and 176).

The C-terminal and the N-terminal domains are separated by two BsmBI restriction sites. The set of repeat domains (RVDs) binding the left part of the DNA target sequence DMDT2.1, ILRGT2.1, and HBBT1.1 (SEQ ID NO: 189, SEQ ID NO: 190 and SEQ ID NO: 192) were subcloned in the pCLS8425 using type IIs restriction enzymes BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence, leading to pCLS8453 (DMDT2.1 left; SEQ ID NO: 180), pCLS8445 (ILRGT2.1 left; SEQ ID NO: 181), and pCLS8461 (HBBT1.1 left; SEQ ID NO: 182) and subsequent left monomer TALENs (SEQ ID NO: 180 to 182). The set of repeat domains (RVDs) binding the right part of the DNA target sequence DMDT2.1, ILRGT2.1, and HBBT1.1 (SEQ ID NO: 183 to 185) were subcloned in the pCLS8429 using type IIs restriction enzymes BsmBI for the receiving plasmids and BbvI and SfaNI for the inserted RVD sequences, leading to pCLS8457 (DMDT2.1 right; SEQ ID NO: 186), pCLS8449 (ILRGT2.1 left; SEQ ID NO: 187), and pCLS8465 (HBBT1.1 left; SEQ ID NO: 188) and subsequent TALEN (SEQ ID NO: 186 to 188).

The plasmids pair pCLS8453 and pCLS8457 were co-transformed, with its DMDT2.1 target into CHO-K1 cells in order to express the heterodimeric TALEN. Activity of the TALEN was screened in our mammalian SSA assay (see example 1).

The plasmids pair pCLS8445 and pCLS8549 were co-transformed, with its ILRGT2.1 target, into CHO-K1 cells in order to express the heterodimeric TALEN. Activity of the TALEN was screened in our mammalian SSA assay (see example 1).

The plasmids pair pCLS8461 and pCLS8465 were co-transformed, with its HBBT1.1 target, into CHO-K1 cells in order to express the heterodimeric TALEN. Activity of the TALEN was screened in our mammalian SSA assay (see example 1).

TALENs activity levels in this assay indicate that they cleave their target sequence in the CHO mammalian cells (FIG. 8).

Example 7 Activity of TALE::TevI

The catalytic domain of I-TevI (SEQ ID NO: 349), a member of the GIY-YIG endonuclease family, was fused to a TAL backbone, composed of a N-terminal domain, a central core composed of RVDs and a C-terminal domain, to create a new class of TALEN (TALE::TevI). To distinguish the orientation (N-terminal vs. C-terminal) of the catalytic domain (CD) fusions, construct names are written as either CD::TALE-RVD (catalytic domain is fused N-terminal to the TALE domain) or TALE-RVD::CD (catalytic domain is fused C-terminal to the TALE domain), where “-RVD” optionally designates the sequence recognized by the TALE domain and “CD” is the catalytic domain type. Herein we describe novel TALE::TevI constructions that target AvrBs3 sequence for example, thus named TALE-AvrBs3::TevI.

Example 7a Activity of TALE::TevI in Yeast

A core TALE scaffold ST2 (SEQ ID NO: 464) onto which (a) different sets of RVD domains could be inserted to change DNA binding specificity, and; (b) a selection of I-TevI-derived catalytic domains could be attached, N- or C-terminal, to effect DNA cleavage (or nicking) was generated. The sT2 truncated scaffold was generated by the PCR from a full-length core TALEN scaffold template (pCLS7183, SEQ ID NO: 3) using primers CMP_G061 (SEQ ID NO: 440) and CMP_G065 (SEQ ID NO: 441) and was cloned into vector pCLS7865 (SEQ ID NO: 442) to generate pCLS7865-cTAL11_CFS1 (pCLS9009, SEQ ID NO: 443) where CFS1 designates the amino acid sequence -GSSG- (with underlying restriction site BamHI and Kpn21 in the coding DNA to facilitate cloning). Two variants of the I-TevI (SEQ ID NO: 349) catalytic domain were amplified by the PCR on templates TevCreD01 (SEQ ID NO: 109 protein in plasmid pCLS6614, SEQ ID NO: 444) using the primer pair CMP_G069 (SEQ ID NO: 445) and CMP_G070 (SEQ ID NO: 446) or TevCreD02 (SEQ ID NO: 110 protein in plasmid pCLS6615, SEQ ID NO: 447) using the primer pair CMP_G069 (SEQ ID NO: 445) and CMP_G071 (SEQ ID NO: 448) and subcloned into the pCLS9009 backbone by restriction and ligation using BamHI and EagI restriction sites, yielding pCLS7865-cT11_TevD01 (pCLS9010, SEQ ID NO: 449) and pCLS7865-cT11_TevD02 (pCLS9011, SEQ ID NO: 450), respectively. Both fusions contains the dipeptide -GS- linking the TALE-derived DNA binding domain and I-TevI derived catalytic domain.

The DNA sequence coding for the RVDs to target the AvrBs3 site (SEQ ID NO: 4) was subcloned into both plasmids pCLS9010 (SEQ ID NO: 451) and pCLS9011 (SEQ ID NO: 452) using Type IIS restriction enzymes BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence to create the subsequent TALE-AvrBs3::TevI constructs cT11Avr_TevD01 (pCLS9012, SEQ ID NO: 453) and cT11Avr_TevD02 (pCLS9013, SEQ ID NO: 454), respectively. These TALE-AvrBs3::TevI constructs were sequenced and the insert transferred to additional vectors as needed (see below and Example 7b).

The final TALE-AvrBs3::TevI yeast expression plasmids, pCLS8523 (SEQ ID NO: 455) and pCLS8524 (SEQ ID NO: 456), were prepared by yeast in vivo cloning using plasmids pCLS9012 and pCLS9013 (SEQ ID NO: 453 and 454), respectively. To generate an intact coding sequence by in vivo homologous recombination, approximately 40 ng of either plasmid (pCLS9012; SEQ ID NO: 453 or pCLS9013, SEQ ID NO: 454) linearized by digestion with BssHII and 1 ng of the pCLS0542 (SEQ ID NO: 2) plasmid DNA linearized by digestion with NcoI and EagI were used to transform the yeast S. cerevisiae strain FYC2-6A (MATα, trp1Δ63, leu2Δ1, his3Δ200) using a high efficiency LiAc transformation protocol (Arnould et al. 2007).

All the yeast target reporter plasmids containing the TALEN DNA target sequences were constructed as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

The TALE-AvrBs3::TevI constructs were tested in a yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on pseudo palindromic targets in order to compare activity with a standard TALE-AvrBs3::FokI TALEN, which requires two binding sites for activity. AvrBS3 targets contain two identical recognition sequences juxtaposed with the 3′ ends proximal and separated by “spacer” DNA ranging from 5 to 40 bps (SEQ ID NO: 220 to 255, Table 3). TALE-AvrBs3::TevI activity levels on their respective targets in yeast cells are shown on FIG. 12. Data summarized in FIG. 12 show that TALE-AvrBs3::TevI is active against several targets in Yeast.

Example 7b Activity of TALE::TevI in Mammalian Cells

DNA encoding the TALE-AvrBs3::TevI construct from either pCLS9012 (SEQ ID NO: 453) or pCLS9013 (SEQ ID NO: 454) was subcloned into the pCLS1853 (SEQ ID NO: 7) mammalian expression plasmid using AscI and XhoI restriction enzymes for the receiving plasmid and BssHII and XhoI restriction enzymes for the TALE-AvrBs3::TevI insert, leading to the mammalian expression plasmids pCLS8993 and pCLS8994 (SEQ ID NO: 457 and 458), respectively.

All mammalian target reporter plasmids containing the TALEN DNA target sequences were constructed using the standard Gateway protocol (INVITROGEN) into a CHO reporter vector (Arnould, Chames et al. 2006, Grizot, Epinat et al. 2010). The TALE-AvrBs3::TevI constructs were tested in an extrachromosomal assay in mammalian cells (CHO K1) on pseudo palindromic targets in order to compare activity with a standard TALE-AvrBs3::FokI TALEN, which requires two binding sites for activity. AvrBS3 targets contain two identical recognition sequences juxtaposed with the 3′ ends proximal and separated by “spacer” DNA ranging from 5 to 40 bps (SEQ ID NO: 220 to 255, Table 3).

For this assay, CHO K1 cells were transfected in a 96-well plate format with 75 ng of target vector and an increasing quantity of each variant DNA from 0.7 to 25 ng, in the presence of PolyFect reagent (1 μL per well). The total amount of transfected DNA was completed to 125 ng (target DNA, variant DNA, carrier DNA) using an empty vector. Seventy-two hours after transfection, culture medium was removed and 150 μl of lysis/revelation buffer for β-galactosidase liquid assay was added. After incubation at 37° C., optical density was measured at 420 nm. The entire process is performed on an automated Velocity11 BioCel platform (Grizot, Epinat et al. 2009).

Activity levels in mammalian cells for the TALE-AvrBs3::TevI constructs (12.5 ng DNA transfected) on the Avr15 target (SEQ ID NO: 230) are shown on FIG. 13. TALE-AvrBs3::TevI appears to be efficient to cleave the target sequence.

Example 7c Engineering of the TALE::TevI

Variants differing by truncations of the C-terminal domain of the AvrBs3-derived TALEN (SEQ ID NO: 1) are chosen as starting scaffolds. A subset of these variants includes truncation after positions E886 (C0), P897 (C11), G914 (C28), L926 (C40), D950 (C64), R1000 (C115), D1059 (C172) (the protein domains of truncated C-terminal domains C11 to C172 are respectively given in SEQ ID NO: 295 to 300) and P1117 [also referred as Cter wt or WT Cter (SEQ ID NO: 400) lacking the activation domain of the C-terminal domain of natural AvrBs3 (SEQ ID NO: 1)]. The plasmids coding for the variant scaffolds containing the AvrBs3-derived N-terminal domain, the AvrBs3-derived set of repeat domains and the truncated AvrBs3-derived C-terminal domain [pCLS7821, pCLS7803, pCLS7807, pCLS7809, pCLS7811, pCLS7813, pCLS7817 (SEQ ID NO: 19 to 25) which are based on the pCLS7184 (SEQ ID NO: 5)] allow cloning of any catalytic domain in fusion to the C-terminal domain, using the restriction sites BamHI and EagI. Variants of the catalytic domain of I-TevI (SEQ ID: 349) are designed from the N-terminal region of I-TevI. A subset of these variants includes truncations of the catalytic domain, the deletion-intolerant region of its linker, the deletion-tolerant region of its linker and its zinc finger (SEQ ID: 459 to 462) (Liu, Dansereau et al. 2008).

The DNA corresponding to these variants of I-TevI is amplified by the PCR to introduce, at the DNA level, a BamHI (at the 5′ of the coding strand) and a EagI (at the 3′ of the coding strand) restriction site and, at the protein level, a linker (for example -SGGSGS- stretch, SEQ ID NO: 463) between the C terminal domain of the TAL and the variant of the catalytic domain of I-TevI. The final TALE::TevI constructs are generated by insertion of the variant of I-TevI catalytic domains into the scaffold variants using BamHI and EagI and standard molecular biology procedures.

All the yeast target reporter plasmids containing the TALEN DNA target sequences were constructed as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006). The TALE-AvrBs3::TevI constructs were tested in a yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on pseudo palindromic targets in order to compare activity with a standard TALE-AvrBs3::FokI TALEN, which requires two binding sites for activity. AvrBS3 targets contain two identical recognition sequences juxtaposed with the 3′ ends proximal and separated by “spacer” DNA ranging from 5 to 40 bps (SEQ ID NO: 220 to 255, Table 3).

Example 8 Activity of TALE::NucA

NucA (SEQ ID NO: 355), a nonspecific endonuclease from Anabaena sp., was fused to a TALE-derived scaffold (composed of a N-terminal domain, a central core composed of RVDs and a C-terminal domain) to create a new class of cTALEN (TALE::NucA). To distinguish the orientation (N-terminal vs. C-terminal) of the catalytic domain (CD) fusions, construct names are written as either CD::TALE-RVD (catalytic domain is fused N-terminal to the TALE domain) or TALE-RVD::CD (catalytic domain is fused C-terminal to the TALE domain), where “-RVD” optionally designates the sequence recognized by the TALE domain and “CD” is the catalytic domain type. Herein, we describe novel TALE::NucA constructions that target for example the AvrBs3 sequence, and are thus named TALE-AvrBs3::NucA. Notably, the wild-type NucA endonuclease can be inhibited by complex formation with the NuiA protein (SEQ ID NO: 473). In a chimeric protein context, the NuiA protein can function as a protein domain to modulate the activity of NucA or TALE::NucA constructs.

Example 8a Activity of TALE::NucA in Yeast

A core TALE scaffold, sT2 (SEQ ID NO: 464), was selected onto which (a) different sets of RVD domains could be inserted to change DNA binding specificity, and; (b) a selection of NucA-derived catalytic domains could be attached, N- or C-terminal, to effect DNA cleavage (or nicking). As previously mentioned, the sT2 truncated scaffold was generated by the PCR from a full-length core TALEN scaffold template (pCLS7183, SEQ ID NO: 3) using primers CMP_G061 (SEQ ID NO: 440) and CMP_G065 (SEQ ID NO: 441) and was cloned into vector pCLS7865 (SEQ ID NO: 442) to generate pCLS7865-cTAL11_CFS1 (pCLS9009, SEQ ID NO: 443), where CFS1 designates the amino acid sequence -GSSG- (with underlying restriction sites BamHI and Kpn2I in the coding DNA to facilitate cloning). The NucA (SEQ ID NO: 355) catalytic domain, corresponding to amino acid residues 25 to 274, was subcloned into the pCLS9009 backbone (SEQ ID NO: 443) by restriction and ligation using BamHI and EagI restriction sites, yielding pCLS7865-cT11_NucA (pCLS9937, SEQ ID NO: 465). The fusion contains the dipeptide -GS- linking the TALE-derived DNA binding domain and NucA-derived catalytic domain. The cloning step also brings at the amino acid level an AAD sequence at the Cter of the NucA catalytic domain.

The DNA sequence coding for the RVDs to target the AvrBs3 site (SEQ ID NO: 4) was subcloned into plasmid pCLS9937 (SEQ ID NO: 465) using Type IIS restriction enzymes BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence to create the subsequent TALE-AvrBs3::NucA construct cT11Avr_NucA (pCLS9938, SEQ ID NO: 466). The TALE-AvrBs3::NucA construct was sequenced and the insert transferred to additional vectors as needed (see below).

The final TALE-AvrBs3::NucA yeast expression plasmid, pCLS9924 (SEQ ID NO: 467), was prepared by yeast in vivo cloning using plasmid pCLS9938 (SEQ ID NO: 466). To generate an intact coding sequence by in vivo homologous recombination, approximately 40 ng of plasmid (pCLS9938) linearized by digestion with BssHII and 1 ng of the pCLS0542 (SEQ ID NO: 2) plasmid DNA linearized by digestion with NcoI and EagI were used to transform the yeast S. cerevisiae strain FYC2-6A (MATα, trp1Δ63, leu2Δ1, his3Δ200) using a high efficiency LiAc transformation protocol (Arnould et al. 2007).

All the yeast target reporter plasmids containing the TALEN DNA target sequences were constructed as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

The TALE-AvrBs3::NucA construct was tested in a yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on pseudo palindromic targets in order to compare activity with a standard TALE-AvrBs3::FokI TALEN, which requires two binding sites for activity. AvrBS3 targets contain two identical recognition sequences juxtaposed with the 3′ ends proximal and separated by “spacer” DNA ranging from 5 to 40 bps (SEQ ID NO: 220 to 255, Table 8). In addition, constructs were tested on a target having only a single AvrBs3 recognition site (SEQ ID NO: 468; Table 8).

Example 8b Engineering of the TALE::NucA

Variants differing by truncations of the C-terminal domain of the AvrBs3-derived TALEN (SEQ ID NO: 5) are chosen as starting scaffolds. A subset of these variants includes truncation after positions E886 (C0), P897 (C11), G914 (C28), L926 (C40), D950 (C64), R1000 (C115), D1059 (C172) (the protein domains of truncated C-terminal domains C11 to C172 are respectively given in SEQ ID NO: 295 to 300) and P1117 [also referred as Cter wt or WT Cter (SEQ ID NO: 400) lacking the activation domain of the C-terminal domain of natural AvrBs3 (SEQ ID NO: 1)]. The plasmids coding for the variant scaffolds containing the AvrBs3-derived N-terminal domain, the AvrBs3-derived set of repeat domains and the truncated AvrBs3-derived C-terminal domain [pCLS7821, pCLS7803, pCLS7807, pCLS7809, pCLS7811, pCLS7813, pCLS7817 (SEQ ID NO: 19 to 25) which are based on the pCLS7184 (SEQ ID NO: 5)] allow cloning of any catalytic domain in fusion to the C-terminal domain, using the restriction sites BamHI and EagI.

The DNA corresponding to amino acid residues 25 to 274 of NucA is amplified by the PCR to introduce, at the DNA level, a BamHI (at the 5′ of the coding strand) and a EagI (at the 3′ of the coding strand) restriction site and, at the protein level, a linker (for example -SGGSGS- stretch, SEQ ID NO: 463) between the C terminal domain of the TALE and the NucA catalytic domain. The final TALE::NucA constructs are generated by insertion of the NucA catalytic domain into the scaffold variants using BamHI and EagI and standard molecular biology procedures. For example, scaffold variants truncated after positions P897 (C11), G914 (C28) and D950 (C64), respectively encoded by pCLS7803, pCLS7807, pCLS7811, (SEQ ID NO: 20, 21 and 23), were fused to the NucA catalytic domain (SEQ ID NO: 355), leading to pCLS9596, pCLS9597, and pCLS9599 (SEQ ID NO: 469 to 471). The cloning step also brings at the amino acid level an AAD sequence at the Cter of the NucA catalytic domain.

All the yeast target reporter plasmids containing the TALEN DNA target sequences were constructed as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

The TALE-AvrBs3::NucA constructs were tested in a yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on pseudo palindromic targets in order to compare activity with a standard TALE-AvrBs3::FokI TALEN, which requires two binding sites for activity. AvrBs3 targets contain two identical recognition sequences juxtaposed with the 3′ ends proximal and separated by “spacer” DNA ranging from 5 to 40 bps (SEQ ID NO: 220 to 255, Table 8). In addition, TALE-AvrBs3::NucA constructs were tested on a target having only a single AvrBs3 recognition site (SEQ ID NO: 468). Data summarized in FIG. 14 show that TALE-AvrBs3::NucA constructs are active on all targets having at least one AvrBs3 recognition site, according to the chimeric protein of the present invention.

Example 9 Activity of TALE::ColE7

The catalytic domain of ColE7 (SEQ ID NO: 478 of ColE7 protein SEQ ID NO: 340), a nonspecific endonuclease from E. coli, was fused to a TALE-derived scaffold (composed of a N-terminal domain, a central core composed of RVDs and a C-terminal domain) to create a new class of cTALEN (TALE::ColE7). To distinguish the orientation (N-terminal vs. C-terminal) of the catalytic domain (CD) fusions, construct names are written as either CD::TALE-RVD (catalytic domain is fused N-terminal to the TALE domain) or TALE-RVD::CD (catalytic domain is fused C-terminal to the TALE domain), where “-RVD” optionally designates the sequence recognized by the TALE domain and “CD” is the catalytic domain type. Herein, we describe novel TALE::ColE7 constructions that target for example the AvrBs3 sequence, and are thus named TALE-AvrBs3::ColE7. Notably, the wild-type ColE7 endonuclease can be inhibited by complex formation with the Im7 immunity protein (SEQ ID NO: 474). In a chimeric protein context, the Im7 protein can function as auxiliary protein domain to modulate the nuclease activity of ColE7 or TALE::ColE7 constructs.

Example 9a Activity of TALE::ColE7 in Yeast

A core TALE scaffold, sT2 (SEQ ID NO: 464), was selected onto which (a) different sets of RVD domains could be inserted to change DNA binding specificity, and; (b) a selection of ColE7-derived catalytic domains could be attached, N- or C-terminal, to effect DNA cleavage (or nicking). As previously mentioned, the sT2 truncated scaffold was generated by the PCR from a full-length core TALEN scaffold template (pCLS7183, SEQ ID NO: 3) using primers CMP_G061 (SEQ ID NO: 440) and CMP_G065 (SEQ ID NO: 441) and was cloned into vector pCLS7865 (SEQ ID NO: 442) to generate pCLS7865-cTAL11_CFS1 (pCLS9009, SEQ ID NO: 443), where CFS1 designates the amino acid sequence -GSSG- (with underlying restriction sites BamHI and Kpn2I in the coding DNA to facilitate cloning). The ColE7 (SEQ ID NO: 478) catalytic domain was subcloned into the pCLS9009 backbone by restriction and ligation using Kpn2I and EagI restriction sites, yielding pCLS7865-cT11_ColE7 (pCLS9939, SEQ ID NO: 475). The fusion contains the dipeptide -GSSG- linking the TALE-derived DNA binding domain and ColE7-derived catalytic domain.

The DNA sequence coding for the RVDs to target the AvrBs3 site (SEQ ID NO: 4) was subcloned into plasmid pCLS9939 (SEQ ID NO: 475) using Type IIS restriction enzymes BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence to create the subsequent TALE-AvrBs3::ColE7 construct cT11Avr_ColE7 (pCLS9940, SEQ ID NO: 476). The TALE-AvrBs3::ColE7 construct was sequenced and the insert transferred to additional vectors as needed (see below).

The final TALE-AvrBs3::ColE7 yeast expression plasmid, pCLS8589 (SEQ ID NO: 477), was prepared by yeast in vivo cloning using plasmid pCLS9940 (SEQ ID NO: 476). To generate an intact coding sequence by in vivo homologous recombination, approximately 40 ng of plasmid (pCLS9940) linearized by digestion with BssHII and 1 ng of the pCLS0542 (SEQ ID NO: 2) plasmid DNA linearized by digestion with NcoI and EagI were used to transform the yeast S. cerevisiae strain FYC2-6A (MATα, trp1Δ63, leu2Δ1, his3Δ200) using a high efficiency LiAc transformation protocol (Arnould et al. 2007).

All the yeast target reporter plasmids containing the TALEN DNA target sequences were constructed as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

The TALE-AvrBs3::ColE7 construct was tested in a yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on pseudo palindromic targets in order to compare activity with a standard TALE-AvrBs3::FokI TALEN, which requires two binding sites for activity. AvrBS3 targets contain two identical recognition sequences juxtaposed with the 3′ ends proximal and separated by “spacer” DNA ranging from 5 to 40 bps (SEQ ID NO: 220 to 255, Table 8). In addition, constructs were tested on a target having only a single AvrBs3 recognition site (SEQ ID NO: 468, Table 8). TALE-AvrBs3::ColE7 activity levels on the respective targets in yeast cells are shown on FIG. 15.

Example 9b Engineering of the TALE::ColE7

Variants differing by truncations of the C-terminal domain of the AvrBs3-derived TALEN (SEQ ID NO: 5) are chosen as starting scaffolds. A subset of these variants includes truncation after positions E886 (C0), P897 (C11), G914 (C28), L926 (C40), D950 (C64), R1000 (C115), D1059 (C172) (the protein domains of truncated C-terminal domains C11 to C172 are respectively given in SEQ ID NO: 295 to 300) and P1117 [also referred as Cter wt or WT Cter (SEQ ID NO: 400) lacking the activation domain of the C-terminal domain of natural AvrBs3 (SEQ ID NO: 1)]. The plasmids coding for the variant scaffolds containing the AvrBs3-derived N-terminal domain, the AvrBs3-derived set of repeat domains and the truncated AvrBs3-derived C-terminal domain [pCLS7821, pCLS7803, pCLS7807, pCLS7809, pCLS7811, pCLS7813, pCLS7817 (SEQ ID NO: 19 to 25) which are based on the pCLS7184 (SEQ ID NO: 5)] allow cloning of any catalytic domain in fusion to the C-terminal domain, using the restriction sites BamHI and EagI.

The DNA corresponding to the catalytic domain of ColE7 is amplified by the PCR to introduce, at the DNA level, a BamHI (at the 5′ of the coding strand) and a EagI (at the 3′ of the coding strand) restriction site and, at the protein level, a linker (for example -SGGSGS- stretch, SEQ ID NO: 463) between the C terminal domain of the TALE and the ColE7 catalytic domain. Additionally, variants of the ColE7 endonuclease domain that modulate catalytic activity can be generated having changes (individually or combined) at the following positions: D493, R496, K497, H545, N560 and H573 [positions refer to the amino acid sequence of the entire ColE7 protein (SEQ ID NO: 340)]. The final TALE::ColE7 constructs are generated by insertion of the ColE7 catalytic domain into the scaffold variants using BamHI and EagI and standard molecular biology procedures.

All the yeast target reporter plasmids containing the TALEN DNA target sequences were constructed as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

The TALE-AvrBs3::ColE7 constructs are tested in a yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on pseudo palindromic targets in order to compare activity with a standard TALE-AvrBs3::FokI TALEN, which requires two binding sites for activity. AvrBS3 targets contain two identical recognition sequences juxtaposed with the 3′ ends proximal and separated by “spacer” DNA ranging from 5 to 40 bps (SEQ ID NO: 220 to 255, Table 8). In addition, constructs were tested on a target having only a single AvrBs3 recognition site (SEQ ID NO: 468, Table 8).

Example 10 Engineering of the TALE-AvrBs3::EndoT7

Variants differing by truncations of the C-terminal domain of the AvrBs3-derived TALEN (SEQ ID NO: 5) are chosen as starting scaffolds. A subset of these variants includes truncation after positions E886 (C0), P897 (C11), G914 (C28), L926 (C40) and D950 (C64) (the protein domains of truncated C-terminal domains C11 to C64 are respectively given in SEQ ID NO: 295 to 298). The plasmids coding for the variant scaffolds containing the AvrBs3-derived N-terminal domain, the AvrBs3-derived set of repeat domains and the truncated AvrBs3-derived C-terminal domain [pCLS7821, pCLS7803, pCLS7807, pCLS7809, pCLS7811, (SEQ ID NO: 19 to 23) which are based on the pCLS7184 (SEQ ID NO: 5)] allow cloning of any catalytic domain in fusion to the C-terminal domain, using the restriction sites BamHI and EagI.

The DNA corresponding to amino acid residues 2-149 of EndoT7 (SEQ ID NO: 363) is amplified by the PCR to introduce, at the DNA level, a BamHI (at the 5′ of the coding strand) and a EagI (at the 3′ of the coding strand) restriction site and, at the protein level, a linker (for example -SGGSGS- stretch, SEQ ID NO: 463) between the C terminal domain of the TALE and the EndoT7 catalytic domain. The final TALE::EndoT7 constructs are generated by insertion of the EndoT7 catalytic domain into the scaffold variants using BamHI and EagI and standard molecular biology procedures. Scaffold variants truncated after positions E886 (C0), P897 (C11), G914 (C28), L926 (C40) and D950 (C64), respectively encoded by pCLS7821, pCLS7803, pCLS7807, pCLS7809, pCLS7811, (SEQ ID NO: 19 to 23), were fused to the EndoT7 catalytic domain (SEQ ID NO: 363), leading to pCLS9600 to pCLS9604 (SEQ ID NO: 502 to 506). The cloning step also brings at the amino acid level an AAD sequence at the Cter of the EndoT7 catalytic domain.

All the yeast target reporter plasmids containing the TALEN DNA target sequences were constructed as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

The TALE-AvrBs3::EndoT7 constructs were tested in a yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on pseudo palindromic targets in order to compare activity with a standard TALE-AvrBs3::FokI TALEN, which requires two binding sites for activity. AvrBs3 targets contain two identical recognition sequences juxtaposed with the 3′ ends proximal and separated by “spacer” DNA ranging from 5 to 40 bps (SEQ ID NO: 220 to 255, Table 8). Data summarized in FIG. 20 show that TALE-AvrBs3::EndoT7 constructs are active on targets having two AvrBs3 recognition site, according to the chimeric protein of the present invention.

Example 11 Replacement of the C-Terminal Domain by a Polypeptide Linker, Activity with colE7, EndoT7 and I-TevI Catalytic Heads

We generated a first library of 37 different linkers. Many of them have a common structure comprising a variable region encoding 3 to 28 amino acids residues and flanked by regions encoding SGGSGS stretch at both the 5′ and a 3′ end (SEQ ID NO: 134 to 170 and SEQ ID: 403 to 439). These linkers contain XmaI and BamHI restriction sites in their 5′ and 3′ ends respectively. The linker library is then subcloned in pCLS7183 (SEQ ID NO: 3) via the XmaI and BamHI restriction sites to replace the C-terminal domain of the AvrBs3-derived TALEN (pCLS7184, SEQ ID NO: 5). The AvrBs3-derived set of repeat domains (RVDs) or any other RVD sequences having or lacking the terminal half RVD is cloned in this backbone library plasmid as described in Example 3A. DNA from the library is obtained, after scrapping of the colonies from the Petri dishes, using standard miniprep techniques. The FokI catalytic head is removed using BamHI and EagI restriction enzymes, the remaining backbone being purified using standard gel extraction techniques.

DNA coding for 3 catalytic heads presented in table 2 (SEQ ID NO: 340, 349 and 363) were amplified by the PCR to introduce, at the DNA level, a BamHI (at the 5′ of the coding strand) and a EagI (at the 3′ of the coding strand) restriction site and, at the protein level, a linker (for example -SGGSGS -stretch, SEQ ID NO: 463) between the C-terminal domain library and the catalytic head. After BamHI and EagI digestion and purification, the DNA coding for the different catalytic heads were individually subcloned into the library scaffold previously prepared.

DNA from the final library is obtained, after scrapping of the colonies from Petri dishes, using standard miniprep techniques and the resulting libraries are screened in our yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on pseudo palindromic targets in order to compare activity with a standard TALE-AvrBs3::FokI TALEN, which requires two binding sites for activity. AvrBs3 targets contain two identical recognition sequences juxtaposed with the 3′ ends proximal and separated by “spacer” DNA containing 15, 18, 21 and 24 bps (SEQ ID NO: 230, 233, 236 and 239, Table 8). In addition, constructs were tested on a target having only a single AvrBs3 recognition site (SEQ ID NO: 468). Data summarized in FIG. 21 show number of linkers of a fraction of constructs (linkers of SEQ ID NO: 147, 150, 156 and 162 for colE7, linkers of SEQ ID NO: 134, 153, 154, 157, 162 and 166 for I-TevI and linkers of SEQ ID NO: 134, 152, 153, 159 and 166 for EndoT7) being active on targets having one or two AvrBs3 recognition sites (FIG. 21) according to the chimeric protein of the present invention.

Example 12 Influence of TAL Repeat Number on TALEN Activity

Because the repeat number in TAL effectors ranges from 1.5 to 33.5 (refs: 2, 24), a key question is how many repeats are needed for TALEN to be active. To answer this question 52 different TALENs were constructed (SEQ ID NO: 507-558) bearing from 9.5 to 15.5 TAL repeats and their nuclease activity were tested toward homodimeric targets bearing a constant 15 bp DNA spacer (SEQ ID NO: 559-581).

Material and Methods

Tal Repeats Array Assembly and Subcloning into Yeast Expression Plasmids

The 52 different TAL repeats arrays containing from 9.5 to 15.5 Tal repeats were synthesized using a solid support method consisting in a sequential assembly of TAL repeats through consecutive restriction/ligation/washing steps as shown in FIG. 23. Briefly, as an example, to assemble RAGT2.3 repeats array (SEQ ID NO: 582 encoding SEQ ID NO: 583), the first TAL repeat (SEQ ID NO: 584 encoding SEQ ID NO: 585) was immobilized on a solid support through biotin/streptavidin interaction, digested by SfaNI type IIS restriction endonuclease and then ligated to a second TAL repeat (SEQ ID NO: 586 encoding SEQ ID NO: 587) harboring SfaNI compatible overhangs at its 5′ end (FIG. 23B). The resulting TAL repeats array (i.e containing TAL repeats 1 and 2) was then used as template for subsequent additions of the appropriate TAL repeats (SEQ ID NO: 588-591, encoding SEQ ID NO: 592-595 for HD, NI, respectively targeting nucleotides C, A and NN, NG respectively targeting nucleotides G and T) to generate the complete TAL repeats arrays RAGT2.3 (FIG. 23C). The complete TAL repeats array was finally digested by SfaNI to generate SfaNI overhangs at its 3′ end (FIG. 23D) and then striped of the solid support using BbvI type IIS restriction endonuclease (FIG. 23E). The digested TAL repeats array was recovered and subcloned into yeast or mammalian expression plasmids harboring the Nterminal domain of AvrBs3 TAL effector and the forty first amino acids of its Cterminal domain fused to FokI type IIS restriction endonuclease (pCLS 7808, i.e. SEQ ID NO: 596 encoding SEQ ID NO: 597, FIG. 23F). pCLS7808 was derived from pCLS0542 (SEQ ID NO: 2) using NcoI and XhoI restriction sites.

Results

The nuclease activities of the 52 different TALENs (SEQ ID NO: 507-558) bearing from 9.5 to 15.5 TAL repeats were tested in yeast toward their homodimeric targets (SEQ ID NO: 559-581) according to the protocol described by (refs: 1, 9). Our yeast assay results showed that all the TALENs tested were active (FIG. 24).

Example 13 Influence of N and N-1 Bases Degeneracy on TALEN Activity

TAL effector DNA binding domains are known to be highly specific with respect to their cognate target (refs: 3, 25). This has been demonstrated for different TAL DNA binding domains by independent studies. However, the influence of TAL repeat number on such specificity is unclear. To address this question in a systematic manner, RAGT2.3 and RAGT2.4 TALENs were chosen as models, then the number of their TAL repeats (15.5, 13.5, 11.5 9.5 TAL repeats were iteratively reduced, according to the assembly method described in example 12 and their TALEN activity were characterized toward their respective homodimeric target degenerated in positions N and N-1 (FIG. 25, SEQ ID NO: 602-615). For the sake of clarity, FIG. 25A displays the different components of a TALEN including the N and C-terminal domains, the TAL DNA binding domain bearing the terminal half repeat and FokI catalytic domain. The FIG. 25B displays the organization of the homodimeric RAGT2.4 TALEN targets (SEQ ID NO: 601) used for our experiments including the location of thymine T0 and the positions N and N-1 degenerated in this study. The FIG. 25C displays two examples of TALEN bearing 15.5 or 11.5 Tal repeats along with their respective DNA targets (top and bottom respectively, SEQ ID NO: 601).

Our results showed that the activities of RAGT2.3 and RAGT2.4 TALENs bearing 15.5 TAL repeats (SEQ ID NO: 616 and 617) were not significantly affected by single DNA/protein mismatch at N or N-1 positions (FIG. 25D). However, reduction of TAL repeats number to 13.5, 11.5 and 9.5, (SEQ ID NO: 618-620 and SEQ ID NO: 621-623) increased the sensitivity of TALEN toward DNA/protein mismatch at N or N-1 positions. Thus our results demonstrated that the TALEN sensitivity to DNA mismatch can be modulated by varying the amount of TAL repeat constituting its DNA binding domain.

Example 14 Novel Variations of the TALE::FokI Scaffold

The catalytic domain of FokI (SEQ ID NO: 600), starting at residue P381, was fused to a TALE-derived scaffold (composed of a N-terminal domain, a central core composed of RVDs and a C-terminal domain) to create a half-TALEN. To distinguish the orientation (N-terminal vs. C-terminal) of the catalytic domain (CD) fusions, construct names are written as either CD::TALE-RVD (catalytic domain is fused N-terminal to the TALE domain) or TALE-RVD::CD (catalytic domain is fused C-terminal to the TALE domain), where “-RVD” optionally designates the sequence recognized by the TALE domain and “CD” is the catalytic domain type. Herein, we describe FokI::TALE constructions that either work together with other FokI::TALE constructions in a conventional “head-to-head” configuration or can be paired with TALE::FokI constructions in a novel “tail-to-head” configuration, allowing for targeting a single DNA strand (when considering the requisite T₀ as 5′ for target readout).

A core TALE scaffold, sT2 (SEQ ID NO: 464), was selected onto which (a) different sets of RVD domains could be inserted to change DNA binding specificity, and; (b) a selection of FokI-derived catalytic domains could be attached, N- or C-terminal, to effect DNA cleavage (or nicking). Two standard fusion scaffolds were generated: (1) pCLS7865-cTAL11_NFS1 (pCLS9008, SEQ ID NO: 624), where NFS1 designates the amino acid sequence -GSSG- (with underlying restriction sites BamHI and Kpn2I in the coding DNA to facilitate cloning), and; (2) pCLS7865-cTAL11_CFS1 (pCLS9009, SEQ ID NO: 660), where CFS1 designates the amino acid sequence -GSSG- (with underlying restriction sites BamHI and Kpn2I in the coding DNA to facilitate cloning).

Example 14a Activity of FokI::TALE in Yeast

The catalytic domain of FokI (SEQ ID NO: 600) was subcloned by restriction and ligation into pCLS9008 (SEQ ID NO: 624) using NcoI and BamHI restriction sites, yielding the construct FokI_cT11 (SEQ ID NO: 625). The fusion contains the peptide -GSSG- linking the TALE-derived DNA binding domain and FokI derived catalytic domain.

The DNA sequence coding for the RVDs to target the AvrBS3 site (SEQ ID NO: 626) was subcloned into the FokI_cT11 (SEQ ID NO: 625) scaffold using Type IIS restriction enzymes BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence to create the subsequent FokI::TALE-AvrBs3 construct FokI_cT11Avr (SEQ ID NO: 627). This construct was sequenced and the insert transferred to additional vectors as needed.

The final FokI::TALE-AvrBS3 yeast expression plasmid, pCLS8674 (SEQ ID NO: 628), was prepared by restriction and ligation of the FokI_cT11Avr (SEQ ID NO: 627) insert into pCLS0542 (SEQ ID NO: 2) using NcoI and EagI restriction enzymes. Plasmid pCLS8674 (SEQ ID NO: 628) was used to transform the yeast S. cerevisiae strain FYC2-6A (MATα, trp1Δ63, leu2Δ1, his3Δ200) using a high efficiency LiAc transformation protocol (Arnould et al. 2007).

All the yeast target reporter plasmids containing the TALEN DNA target sequences were constructed as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

The FokI::TALE-AvrBs3 construct was tested in a yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on pseudo palindromic targets, since the construct requires two binding sites for activity. AvrBs3 targets contain two identical recognition sequences juxtaposed with the 5′ ends proximal (FIG. 9 D) and separated by “spacer” DNA ranging from 5 to 35 bps (SEQ ID NO: 629 to 659, Table 9). FokI::TALE-AvrBs3 activity levels on the respective targets in yeast cells are shown in FIG. 27.

Example 14b Activity of the Combination TALE::FokI and FokI::TALE in Yeast

The catalytic domain of FokI (SEQ ID NO: 600) was subcloned by restriction and ligation into pCLS9009 (SEQ ID NO: 660) using Kpn2I and EagI restriction sites, yielding the construct cT11_FokI (SEQ ID NO: G661). The fusion contains the peptide -GSSG- linking the TALE-derived DNA binding domain and FokI derived catalytic domain.

The DNA sequence coding for the RVDs to target the RagT2-R site (SEQ ID NO: 662) was subcloned into the cT11_FokI (SEQ ID NO: 661) scaffold using Type IIS restriction enzymes BsmBI for the receiving plasmid and BbvI and SfaNI for the inserted RVD sequence to create the subsequent TALE-RagT2-R::FokI construct cT11RagT2-R_FokI (SEQ ID NO: 663). This construct was sequenced and the insert transferred to additional vectors as needed.

The final TALE-RagT2-R::FokI yeast expression plasmid, pCLS9827 (SEQ ID NO: 664), was prepared by restriction and ligation of the cT11RagT2-R_FokI (SEQ ID NO: 663) insert into pCLS7763 (SEQ ID NO: 665) using NcoI and EagI restriction enzymes. The plasmid pair pCLS9827 (SEQ ID NO: 664) and pCLS8674 (SEQ ID NO: 628) was then used in co-transformation experiments in the standard yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

All the yeast target reporter plasmids containing the TALEN DNA target sequences were constructed as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006).

The TALE-RagT2-R::FokI/FokI::TALE-AvrBs3 construct pairs were tested in a yeast SSA assay as previously described (International PCT Applications WO 2004/067736 and in Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) on asymmetric RagT2-R/AvrBs3 hybrid targets and compared with a parent standard TALEN (e.g. pCLS8674 (SEQ ID NO: 628) on appropriate pseudo palindromic targets (e.g. (SEQ ID NO: 629 to 659, Table 9). RagT2-R/AvrBs3 hybrid targets contain two different recognition sequences juxtaposed with the 3′ end of the first (RagT2-R) proximal to the 5′ end of the second (AvrBs3) and separated by “spacer” DNA ranging from 5 to 40 bps (SEQ ID NO: 666 to 701, Table 10). TALE-RagT2-R::FokI/FokI::TALE-AvrBs3 activity levels on the respective targets in yeast cells are shown in FIG. 28.

LIST OF CITED REFERENCES

-   1. Arnould, S., P. Chames, et al. (2006). “Engineering of large     numbers of highly specific homing endonucleases that induce     recombination on novel DNA targets.” J Mol Biol 355(3): 443-58. -   2. Boch, J. and U. Bonas (2010). “Xanthomonas AvrBs3 family-type III     effectors: discovery and function.” Annu Rev Phytopathol 48: 419-36. -   3. Boch, J., H. Scholze, et al. (2009). “Breaking the code of DNA     binding specificity of TAL-type III effectors.” Science 326(5959):     1509-12. -   4. Bogdanove, A. J., S. Schornack, et al. (2010). “TAL effectors:     finding plant genes for disease and defense.” Curr Opin Plant Biol     13(4): 394-401. -   5. Bonas, U., R. E. Stall, et al. (1989). “Genetic and structural     characterization of the avirulence gene avrBs3 from Xanthomonas     campestris pv. vesicatoria.” Mol Gen Genet 218(1): 127-36. -   6. Cermak, T., E. L. Doyle, et al. (2011). “Efficient design and     assembly of custom TALEN and other TAL effector-based constructs for     DNA targeting.” Nucleic Acids Res. -   7. Chames, P., J. C. Epinat, et al. (2005). “In vivo selection of     engineered homing endonucleases using double-strand break induced     homologous recombination.” Nucleic Acids Res 33(20): e178. -   8. Christian, M., T. Cermak, et al. (2010). “Targeting DNA     double-strand breaks with TAL effector nucleases.” Genetics 186(2):     757-61. -   9. Epinat, J. C., S. Arnould, et al. (2003). “A novel engineered     meganuclease induces homologous recombination in yeast and mammalian     cells.” Nucleic Acids Res 31(11): 2952-62. -   10. Grizot, S., J. C. Epinat, et al. “Generation of redesigned     homing endonucleases comprising DNA-binding domains derived from two     different scaffolds.” Nucleic Acids Res 38(6): 2006-18. -   11. Gurlebeck, D., B. Szurek, et al. (2005). “Dimerization of the     bacterial effector protein AvrBs3 in the plant cell cytoplasm prior     to nuclear import.” Plant J 42(2): 175-87. -   12. Li, T., S. Huang, et al. (2011). “TAL nucleases (TALNs): hybrid     proteins composed of TAL effectors and FokI DNA-cleavage domain.”     Nucleic Acids Res 39(1): 359-72. -   13. Li, T., S. Huang, et al. (2011). “Modularly assembled designer     TAL effector nucleases for targeted gene knockout and gene     replacement in eukaryotes.” Nucleic Acids Res. -   14. Liu, Q., J. T. Dansereau, et al. (2008). “Role of the     interdomain linker in distance determination for remote cleavage by     homing endonuclease I-TevI.” J Mol Biol 379(5): 1094-106. -   15. Miller, J. C., S. Tan, et al. (2010). “A TALE nuclease     architecture for efficient genome editing.” Nat Biotechnol 29(2):     143-8. -   16. Moore, I., M. Samalova, et al. (2006). “Transactivated and     chemically inducible gene expression in plants.” Plant J 45(4):     651-83. -   17. Moscou, M. J. and A. J. Bogdanove (2009). “A simple cipher     governs DNA recognition by TAL effectors.” Science 326(5959): 1501. -   18. Padidam, M. (2003). “Chemically regulated gene expression in     plants.” Curr Opin Plant Biol 6(2): 169-77. -   19. Romer, P., S. Recht, et al. (2009). “A single plant resistance     gene promoter engineered to recognize multiple TAL effectors from     disparate pathogens.” Proc Natl Acad Sci USA 106(48): 20526-31. -   20. Smith, J., S. Grizot, et al. (2006). “A combinatorial approach     to create artificial homing endonucleases cleaving chosen     sequences.” Nucleic Acids Res. -   21. Szurek, B., O. Rossier, et al. (2002). “Type III-dependent     translocation of the Xanthomonas AvrBs3 protein into the plant     cell.” Mol Microbiol 46(1): 13-23. -   22. Wang, R., X. Zhou, et al. (2003). “Chemically regulated     expression systems and their applications in transgenic plants.”     Transgenic Res 12(5): 529-40. -   23. Zuo, J. and N. H. Chua (2000). “Chemical-inducible systems for     regulated expression of plant genes.” Curr Opin Biotechnol 11(2):     146-51. -   24. Kay, S. & Bonas, U. How Xanthomonas type III effectors     manipulate the host plant. Curr Opin Microbiol 12, 37-43 (2009). -   25. Scholze, H. & Boch, J. TAL effector-DNA specificity. Virulence     1, 428-32. -   26. Mak, A. N., Bradley, P., Cernadas, R. A., Bogdanove, A. J. &     Stoddard, B. L. The crystal structure of TAL effector PthXo1 bound     to its DNA target. Science 335, 716-9. 

1. A chimeric protein derived from a Transcription Activator-Like Effector (TALE) comprising: (i) a core scaffold comprising Repeat Variable Dipeptide regions (RVDs) having specificity to bind a nucleic acid sequence adjacent to a nucleic acid target sequence to process; (ii) a catalytic domain to process said nucleic acid target sequence.
 2. The chimeric protein according to claim 1, further comprising a peptidic linker to fuse said catalytic domain to said core scaffold.
 3. The chimeric protein according to claim 1, wherein said catalytic domain is a polypeptide comprising an amino acid sequence having at least 80%, preferably at least 90%, more preferably at least 95% identity with any of SEQ ID NO: 339 to 397 and SEQ ID NO: 598-599.
 4. The chimeric protein according to claim 3, wherein said catalytic domain has an identity with I-TevI (SEQ ID NO: 349).
 5. The chimeric protein according to claim 4, wherein said catalytic domain having an identity with I-TevI (SEQ ID NO: 349) is fused to the N-terminal domain of said core scaffold.
 6. The chimeric protein according to claim 4, wherein said catalytic domain having an identity with I-TevI (SEQ ID NO: 349) is fused to the C-terminal domain of said core scaffold.
 7. The chimeric protein according to claim 3, wherein said catalytic domain has an identity with NucA (SEQ ID NO: 355).
 8. The chimeric protein according to claim 7, wherein said catalytic domain having an identity with NucA (SEQ ID NO: 355) is fused to the N-terminal domain of said core scaffold.
 9. The chimeric protein according to claim 7, wherein said catalytic domain having an identity with NucA (SEQ ID NO: 355) or derivatives thereof is fused to the C-terminal domain of said core scaffold.
 10. The chimeric protein according to claim 3, wherein said catalytic domain has an identity with ColE7 (SEQ ID NO: 340).
 11. The chimeric protein according to claim 10, wherein said catalytic domain having an identity with ColE7 (SEQ ID NO: 340) is fused to the N-terminal domain of said core scaffold.
 12. The chimeric protein according to claim 10 wherein said catalytic domain having an identity with ColE7 (SEQ ID NO: 340) is fused to the C-terminal domain of said core scaffold.
 13. The chimeric protein according to claim 1, wherein said core scaffold comprises one additional catalytic domain at the N-terminus of said set of Repeat Variable Dipeptide regions.
 14. The chimeric protein according to claim 13, wherein said additional catalytic domain at the N-terminus of said core scaffold comprises an amino acid sequence having at least 80%, preferably at least 90%, more preferably at least 95% identity with Fok-1 (SEQ ID NO: 600).
 15. The chimeric protein according to claim 13, wherein the catalytic domains respectively fused to the C-terminus and to the N-terminus of said core scaffold are selected from those having identity with: (i) NucA domain (SEQ ID NO: 355) in N-terminus and NucA domain (SEQ ID NO: 355) in C-terminus; (ii) ColE7 domain (SEQ ID NO: 340) in N-terminus and ColE7 domain (SEQ ID NO: 340) in C-terminus; (iii) NucA domain (SEQ ID NO: 355) in N-terminus and ColE7 domain (SEQ ID NO: 340) in C-terminus; (iv) ColE7 domain (SEQ ID NO: 340) in N-terminus and NucA domain (SEQ ID NO: 355) in C-terminus; (v) NucA domain (SEQ ID NO: 355) in N-terminus and I-TevI domain (SEQ ID NO: 349) in C-terminus; (vi) ColE7 domain (SEQ ID NO: 340) in N-terminus and I-TevI domain (SEQ ID NO: 349) in C-terminus; (vii) FokI domain (SEQ ID NO: 600) in N-terminus and ColE7 domain (SEQ ID NO: 340) in C-terminus; (viii) FokI domain (SEQ ID NO: 600) in N-terminus and NucA domain (SEQ ID NO: 355) in C-terminus.
 16. The chimeric protein according to claim 13, wherein the catalytic domains respectively fused to the C-terminus and to the N-terminus of said core scaffold are selected from those having identity with: (i) NucA domain (SEQ ID NO: 355) in N-terminus and I-TevI domain (SEQ ID NO: 349) in C-terminus; (ii) ColE7 domain (SEQ ID NO: 340) in N-terminus and I-TevI domain (SEQ ID NO: 349) in C-terminus; (iii) FokI domain (SEQ ID NO: 600) in N-terminus and ColE7 domain (SEQ ID NO: 340) in C-terminus; (iv) FokI domain (SEQ ID NO: 600) in N-terminus and NucA domain (SEQ ID NO: 355) in C-terminus, said chimeric protein acting as a monomer wherein two of said monomer respectively bind one nucleic acid sequence adjacent to a nucleic acid target sequence thereby together processing said nucleic target sequence.
 17. The chimeric protein according to claim 1, wherein each RVD comprises a pair of amino acids responsible for recognizing and binding nucleotides, selected from the group consisting of HD for recognizing C, NG for recognizing T, NI for recognizing A, NN for recognizing G or A, NS for recognizing A, C, G or T, HG for recognizing T, IG for recognizing T, NK for recognizing G, HA for recognizing C, ND for recognizing C, HI for recognizing C, HN for recognizing G, NA for recognizing G, SN for recognizing G or A, YG for recognizing T, TL for recognizing A, VT for recognizing A or G and SW for recognizing A.
 18. The chimeric protein according to claim 1, wherein each RVD comprises a pair of amino acids responsible for recognizing and binding nucleotides, selected from the group consisting of NN or NG for recognizing G, HD for recognizing C, NG for recognizing T, VT for recognizing A or G and NI or TL or SW for recognizing A.
 19. A recombinant polynucleotide encoding a chimeric protein according to claim
 1. 20. A vector comprising a recombinant polynucleotide according to claim
 19. 21. A composition comprising a chimeric protein according to claim 1 and a carrier.
 22. A host cell which comprises a recombinant polynucleotide of claim
 19. 23. A non-human transgenic animal which comprises a recombinant polynucleotide of claim
 19. 24. A non-human transgenic animal which comprises a vector of claim
 20. 25. A transgenic plant which comprises a recombinant polynucleotide of claim
 19. 26. A transgenic plant which comprises a vector of claim
 20. 27. A kit comprising a chimeric protein according to claim 1 and instructions for use. 